The Use and Misuse of 
Econometrics 


(With Reference to SHARP APL) 


David K. Foot 
Andrew North 


INSTITUTE FOR POLICY ANALYSIS 
UNIVERSITY OF TORONTO / TORONTO, CANADA 





THE USE AND MISUSE OF ECONOMETRICS 


(With Reference to SHARP APL) 


by 


David K. Foot 
and 


Andrew North 


Institute for Policy Analysis 
University of Toronto 


(e) Copyright 1977, David K. Foot and Andrew North, 
Second Edition. 


Tilastokirjasto 


istikbibli 
Statistikk il d oteket 





PREFACE 


This publication is one of a series of publications arising out 
of the Quantitative Analysis Course. The Quantitative Analysis Course 
was an intensive and experimental five-year teaching program for 
federal government staff involved in policy analysis. It was mounted 
initially in 1971 as a pilot project by the Institute for Policy 
Analysis of the University of Toronto under the sponsorship of the 
Treasury Board Secretariat, and was operated directly by the Secretariat 
over the period 1974 - 1976. 


To date, the publications in this series are: 


David K. Foot, "Policy Problems in Microeconomics" 
David K. Foot and Andrew North, "The Use and Misuse of Econometrics" 
Dale Orr, "Applied Econometrics" 


Dale Orr and Gerry Slusar, "Exercises in Applied Econometrics" 


Dr. A.R. Dobell was the original Course Director and he has 
contributed a Foreword to this volume. 


The Institute for Policy Analysis is pleased to be able to make 
these teaching materials publicly available. 


John A. Sawyer, Director 
Institute for Policy Analysis 
University of Toronto 





(i) 


FOREWORD 


From the beginning the Quantitative Analysis Course emphasized an 
innovative and tutorial style of instruction. As an experiment in edu- 
cational structures, it was based on "learning-by-doing", and entailed 
the development of special materials in response to the interests of 
participants working in small groups. 


One of the topics in which this sort of innovative approach proved 
most fruitful was econometrics. With an emphasis explicitly on applied 
work, our conviction was that small groups working intensively on realis- 
tic problems, with ample computing support through time-sharing facilities, 
would most effectively develop the necessary mastery of econometric tech- 
nique. There were some costs to this approach: participants in the 
Course spent painful hours punching data matrices into files, or re-running 
regressions rendered useless by minor errors in specification or computa- 
tion. But there were also benefits: those who have been through the 
Course have experienced the work of applied econometrics from the initial 
argument as to the appropriate hypothesis and selection of variables, 
through data assembly and estimation, to a final oral defence of the esti- 
mated forms. They know something of the data available in databanks such 
as CANSIM, and the procedures for retrieving data series or setting up 
data files, something of the canned computer programmes available, the 
diagnostic tests appropriate, the relevant tests of significance, and the 
interpretation attached to standard computer output formats. While those 
who have experienced this machine-oriented approach to the teaching of 
econometrics will probably not have the theoretical properties of k-class 
estimators at their fingertips, they will have developed, through experi- 
ence, some awareness of the purpose and limitations of quantitative work, 
a lively appreciation of both conceptual and practical pitfalls, and an 
insight into the difficulties of relating statistical technique meaning- 
fully to concrete problems of public policy or programme management. 


Most participants have felt that their grasp of econometrics as a 
tool and as an intellectual discipline is far greater following this 
experience than as a result of conventional graduate courses in the area. 
However that may be, I think it is indisputable that their ability to 
estimate, interpret, and deliver a sensible regression is greater. What 
is perhaps more important, their ability to manage a group of analysts 
with some responsibility for quantitative work is enhanced by their 
appreciation of the scale of the task, the limitations of the tools, and 
the computational or data resources available. Future analysts may thus 
find in them supervisors they can talk to. 


This document, which illustrates the approach we used, grew out 
of lectures and materials developed by David Foot for the econometrics 


(ii) 


sessions in the second and third rounds of the Course (1972 and 1973), 

in collaboration with Andrew North of I.P. Sharp Associates Limited. 

It is designed to serve as a guide to the analyst using econometrics as a 
tool of analysis. It provides a brief description of many econometric 
methods, as well as comments and warnings regarding their use. The con- 
centration is on the application of econometrics, since the theory is 
readily available else re. The primary text chosen for reference pur- 
poses is J. Johnston's Econometric Methods, second edition, McGraw-Hill 
Book Co., New York, 1972. One main reference, rather than many, was 
chosen because of the convenience of notational familiarity and the need 
to purchase or refer to one book. Johnston was chosen because it presents 
material at the appropriate level of instruction for this manual and be- 
cause of its wide use in university courses. A short bibliography of 
alternative texts is provided at the end of the manual. 


An attempt has been made to encourage the correct and efficient use 
of econometric analysis. The emphasis has not been placed on the question 
of when this analysis is appropriate, but rather on the discussion of 
techniques available, with an orientation towards the correct application 
of the techniques, under the assumption that the analysis is appropriate. 
Three ingredients are required to encourage this correct and efficient 
use: 


(i) a brief presentation of the underlying theory; 
(ii) an example of the application of the theory; and 


(iii) a discussion of the problems which may arise in 
practice. 


All of these ingredients are presented in this manual, which outlines 
the theoretical problem, provides a brief development of the solution (if 
necessary), with appropriate references to the text, and displays the 
theoretical result needed for application. Wherever possible, this result 
is then applied to a specific example, with a clear, concise identification 
of the computer output provided. Finally, a brief section outlining the 
practical problems that can be encountered in the application of the theory 
is presented. This approach has been taken for all sections of the manual; 
cross references are provided where appropriate. 





The examples of computer output were produced using the SHARP APL 
system of I.P. Sharp Associates Limited. The facility of matrix manipula- 
tion in APL (A Programming Language) makes the language ideal for econo- 
metric work. Many common statistical operations can be performed conveni- 
ently and efficiently using this language on the computer. 


As is common with most co-authored documents, the authors engaged 
in some division of labour in the preparation of this manual. David Foot 
assumed primary responsibility for the sections concerned with the 


(iii) 


theoretical developments and empirical implementation, while Andrew North 
assumed primary responsibility for the development of the user-oriented 
computer programs and the sections dealing with programming in APL. The 
sections containing examples were a joint responsibility. Any corrections, 
comments or suggestions for improvements to the manual will be welcomed by 
the authors. 


As the designer and first director of the Quantitative Analysis 
Course, and subsequently its sponsor within the Treasury Board Secretariat, 
I am pleased that the Course had the opportunity to support the preparation 
of this document, and (speaking for all those associated with the manage- 
ment of the Course) I am grateful for the extensive contribution of Andrew 
North and I.P. Sharp Associates to it. Speaking for myself, I should also 
like to express gratitude to David Foot for the outstanding teaching effort 
that led to these materials. The result illustrates well some of the 
features that were targets in the original design of the Course, and I hope 
that it will prove useful to analysts working on problems in quantitative 
analysis generally, as well as to participants in university programmes 
similar to the Quantitative Analysis Course. I hope also that it, together 
with other materials in this series, may serve to stimulate further develop- 
ments in the teaching of applied policy analysis. 


A.R. Dobell 





PREFACE TO SECOND EDITION 


Since the publication of the first edition of The Use and Misuse 
of Econometrics in February, 1975, we have received a number of sugges- 
tions for additions and improvements to the text. The need for a second 
edition has also increased as a result of the unavailability of the first 
edition for over a year. Numerous small changes designed to improve the 
exposition are evident throughout the text. These include the examples 
which have been estimated using the current format of the programs avail- 
able on the I.P. Sharp system. Somewhat more extensive changes have been 
necessitated by additions and improvements to the public libraries of that 
system. These are evident throughout Chapter III and in Sections 4.5, 6.4 
and 8.5, and in the Appendix of this edition. A number of additions have 
also been made to the text. A new chapter (VII) has been included to 
cover applications of generalized least-squares, which can be implemented 
using an appropriate computer program, and a new section (8.2) has been 
included to cover prediction under these conditions. Other new sections 
with examples have been included to cover Shiller lags (8.7) and three- 
stage least squares (9.6). The annotated bibliography has been updated 
and, at the request of previous users, an index has been added. 


We are grateful to our sponsors, the Institute for Policy Analysis 
and I.P. Sharp Associates Limited, for encouraging and assisting us with 
the production of this revised edition and to Ms. Lorelle Triolo of the 
Institute who prepared this manuscript for publication. We continue to 
solicit corrections, comments or suggestions for improvements to this 
edition. 


David K. Foot 
Andrew North 
October, 1977. 





CHAPTER I 
dc? 


— 
D 
Po 


CHA 


CHAPTER III 
Je 


CHAPTER IV 


PO PO POM MY YOU 
b * o E s 


— — 
D D 


T 


WWW 
FWP 


> 


+ > > > ZZZ PP + > Es ZZ 


R 


AORWN—M Bw 


On On — Q rS — 


CO N 


Py 
she 
WS 
.14 
WA 


TABLE OF CONTENTS 


INTRODUCTION 
Econometrics 
Terminology 
Notation 
Summary 


DATA MATRICES 

Introduction 

Accessing the CANSIM Data Base 
Constructing Data Matrices Using APL 
Entry of Example Data 

Identification of Data 
Cross-Sections and Time-Series 


ESTIMATION 

Estimation by Ordinary Least Squares (OLS) 
The REGR Proqram 

Summary of REGR Output 

Examples 

Estimation with Large Samples 
The REGRESS Program 


DATA MANIPULATION 

Introduction 

Addition of a Constant 

Addition of Variables 

Addition of Matrices 

Multiplication by a Constant (Scaling) 

Multiplication by a Variable (Polynomial 
Regression 

Multiplication by a Matrix 

Division by a Variable (Scaling and 
Reciprocal Regression) 

Logarithms (Log Linear Regression) 

Comments on Curvilinear Regression 

Time Trends 

Lags 

Dummy Variables (Seasonality) 

Interactive Dummy Variables 

Prior Restrictions 


Page 


On Uw [S — 


10 
12 
19 
21 
23 


26 
35 
43 
44 


48 


52 
52 
53 
53 
53 
54 


57 
58 


60 
64 
68 
70 
72 
75 
77 


CHAPTER V 
5.1 


oo orn 
OR wr 


CHAP 


TER VI 


6.1 


6 
6 
6 
6 


6 


CHAP 
7 


CHA 


CHAP 
9 


CO CO CO œ œ œ œo "0 zl zl zl zl N 


“O WOW VO LO 


-2 
sd 
4 
J9 


.6 


TER VII 
> Í 


On Om w N 


TER VIII 
ei 


OO Pa c> P 


SND 


TER IX 
ot 


OP GA Pi 


a 


HYPOTHESIS TESTING 

Introduction 

Selection of a Confidence Interval 
t-Tests 

Two Uses of t-Tests 

F-Tests 


REGRESSION PROBLEMS IN PRACTICE 
Introduction 
Multicol linearity 
Heteroscedasticity 
Autocorrelation 
The COCHRANEAORCUTT Program 
The HILDRETHALU Program 
Misspecification 
Errors in Variables 


GENERALIZED LEAST SQUARES (GLS) 

Introduction 

Heteroscedasticity and Autocorrelation 
The GLS Program 

Grouped Data 

Stochastic Prior Information 

Linear Constraints 

Grouped Equations 


PREDICTION AND DISTRIBUTED LAGS 
Prediction with OLS 
The PREDICT Program 
Prediction with GLS 
Distributed Lags 
Distributed Lags - The Geometric Pattern 
Polynomial Distributed Lags (Almon Lags) 
The PDLAG Program 
Other Distributed Lag Patterns 
Shiller Lags 
The SHILLER Program 


INSTRUMENTAL VARIABLES 
Instrumental Variables (IV) 
Instrumental Variables and Lagged 
Dependent Variables 
Indirect Least Squares (ILS) 
Recursive Models 
Two-Stage Least Squares (2SLS) 
The STAGE2 Program 
Three-Stage Least Squares (3SLS) 
The STAGE3 Program 


Page 


82 
83 
84 
88 
93 


102 
103 
110 
119 
127 
129 
135 
138 


140 
141 
141 
144 
147 
150 
152 


157 
162 
164 
166 
169 
176 
185 
188 
190 
199 


203 


206 
207 
210 
211 
212 
214 
215 


Page 
CHAPTER IX continued. 


9.7 Structurally Ordered Instrumental 
Variables (SOIV) 220 
9.8 Principal Components 221 
The PRINCIPAL Program 223 
CHAPTER X - SUMMARY 226 
APPENDIX - NON-LINEAR ESTIMATION 230 
The MARQUARDTP Program 231 
BIBLIOGRAPHY 246 


INDEX 253 





CHAPTER I 
INTRODUCTION 


1.1 Econometrics 


Econometrics is concerned with the estimation and testing 
of the parameters of statistical relationships between variables. 
A statistical relationship is a mathematical expression involving 
a random component (error term), while a variable, in this con- 
text, can be defined as "an influence". Relationships are expressed 
as mathematical equations and a collection of related equations 
is defined as a model. A model may consist of only one equation. 


The practice of econometrics involves four main steps: 


) specification of the model; 

) collection of the relevant data; 

(iii) estimation of the parameters of the model; 
) 


testina hypotheses about the model. 


The first step is very important since it defines the variables 
involved, the direction of causation and the form of the relation- 
ship. Only linear (or linearised) relationships will be considered 


in this manual JI The specification chosen may have its basis in 


1/ Non-linear estimation is briefly discussed in an Appendix. 


theory or in institutional behaviour. The second step is often 

the most time consuming and it is very important that the data 

be the relevant data. Different techniques for carryinq out the 
third step will be presented and discussed in this manual. The 
fourth step attempts to judge whether or not the estimated relation- 
ship is a sufficiently realistic description of the underlying 
behaviour to enable the model to be used for the purpose for which 
it was built. Given the above definition of econometrics, this 
manual will be primarily concerned with step (iii) and, to a lesser 
extent, step (iv). However, since they are firmly based on steps 
(i) and (ii) the importance of these steps cannot be over-emphasised 


and will be mentioned where appropriate. 


1.2 Terminology 


The basic ingredient of econometric analysis is the matrix, 
which is a rectangular array of informátioñ (data) organt#zed in some 
clearly identifiable fashion. Each piece of information, such as 
population for Canada (1969), is called an observation or an element 
of a matrix. A series of elements, such as the population for all 
countries (1969) is called a vector. A series of vectors, such as 
population, armed forces and GNP for all countries (1969) is called 
a matrix. A matrix is of dimension (or size) [rxc] (read r times 
c), where r denotes the number of rows and c the number of 


columns. (The mnemonic Roman Catholic is often useful to remember 


which comes first.) A vector is therefore a matrix with either 
r or c equal to 1, and when both r and c equal 1 a matrix 
reduces to a scalar (just an ordinary number). Consequently, both 
vectors and scalars are special cases of matrices. In a matrix 
of dimension [rxc] there are rc observations, so for example, 
with r = 59 (countries) and c = 3 (population, armed forces, 
GNP) there are 177 observations. Data set up in this form is often 
referred to as a data matrix. 

The above paragraph outlines the terminology that will 
be used in this manual. It is beyond the scope of this manual 
to present the theory of matrix (or linear) algebra, so the user 
is referred to an appropriate text (such as Johnston, Chapter 4) 
for this material. Basic familiarity with this theory will be 


assumed throughout the manual. 
1.3 Notation 


The simplest statistical relationship between two variables, 
denoted x and y (say), is a linear relationship. (This is step 
(i) above.) Information can be collected on each variable and 
arranged in a vector (step (ii)). Suppose n observations are 
obtained on each variable, which are indexed by i (i= 1, 2, 

.» n); then the relationship for a given i can be expressed 


as the equation 


y; =a + BX, +u; (1) 


where 


y; is defined as the dependent variable, 


X; is the independent or explanatory variable (or 
regressor), 


u. is a random error term, 


and a and B are parameters (or coefficients) to be estimated. 


In vector notation (1) can be written as 
y=at+ px tu (2) 


where y , x and u are [nxl] vectors and a and 8 are 
scalars. a is defined as the intercept?/ and 8 the slope of 
the relationship, as can be illustrated in the following diagram 


for positive a and B 





2/ This is often called the constant term. 


Given estimates of the parameters (step (iii)), denoted a and 


B , an estimated version of (2) can be written as 


A A 


y = w + Bx + e (3) 


with an estimate of y , denoted y , calculated as 


A 


y 


A 


A 
a + BX 


so that 


A A 


yte or e= y-y ° (4) 


< 
" 


This is also illustrated in the above diagram for a given X. 

(In the case illustrated, DE 7 so that ej > 0. ) The [nx1] 
vector e is often referred to as the vector of residuals or 
estimated errors. 

There is no reason why y cannot depend on more than one 
explanatory variable. Assuming a linear relationship, with an 
intercept term, now denoted B, , (where x, isan [nxl] 
vector of ones) and (k-1) explanatory variables denoted 
X29 X35 eves Xk (each being an [nxl] vector), the multivariate 


3/ 


(or general linear) vector version of (2) can be written as> 


y = Bi + BoXo + BsXs + ... + BX, + u (5) 


3/ Johnston, pp. 121-122. 


or in matrix notation 
y=X8 +u (6) 
where 


y isan [nxl] vector (as before), 


X is an [nxk] matrix of explanatory variables 
including x, , 


B isa [k<1] vector of parameters to be estimated, 
including the intercept, 


and u is an [nx1l] vector (as before). 


The estimated version of (6) (multivariate version of (3)) is 
often referred to as the regression of y on X and can be writ- 


ten as 
Y= XB te (7) 


with Y and £ being defined analogously to (4) above. 


1.4 Summary 


From the above description, the following notation emerges: 


(a) The subscript i (i= 1, 2, ... , n) identifies a 
Single observation. There are a total of n such 


N š r 4 
observations for each series (or variable). Ai 


4/ For "e eli data this is often replaced by t (t= 1, 2, 
ef 


(d) 


(e) 


(f) 


(g) 


The letter y is used to identify the [nx1] vector(s) 
of dependent variable(s) while the letter x identi- 
fies the [nx1] vector(s) of independent or explanatory 


variable(s) (sometimes called the regressor(s)). 


Vectors are denoted by lower case letters, with x, 


being an [nx1] vector of ones. 
Matrices are denoted by upper case letters. 


Parameters (or coefficients) are denoted by Greek 


letters. 


When there is more than one independent variable, a 
number subscript (on parameters: or independent variables) 
is used to identify the associated independent variable. 


The index j may also be used for this purpose. 


Estimates are indicated by 'hats' (e.g., B), except 
for e , which is a (vector) estimate of u and is 


referred to as the vector of residuals. 


This notation will be rigidly adhered to throughout this manual. 


CHAPTER II 


DATA MATRICESI/ 


2.1 Introduction 


Once a specification has been chosen (step (i)), the rele- 
vant data must be collected (step (ii)), and assembled into a 
data matrix preparatory to use in estimation (step (iii)). This 
chapter presents the salient details regarding the collection of 
pertinent data into a data matrix using APL. This data will then 
be an input to the estimation procedure presented in the next 
chapter. 

Consider the multivariate equation (relations (5) to (7)) 
presented in Chapter I. To estimate this single equation, nk 


observations are required, since there are 


n observations on the dependent variable 


and n(k-1) observations on the independent variables. 


These can be arranged in a single data matrix by placing the vector 
of observations on the dependent variable immediately before those 
on the independent variables in an [nxk] partitioned matrix of 


the form: 


[y:X] 


1/ See the Massager Plus Manual available from I.P. Sharp Associates 
Ltd. for a more detailed presentation of matrix operations 
using APL. 


where the dotted line indicates an imaginary partition. Note 

that for estimation purposes this matrix would require (nk + n) 
observations, the additional n observations being a column of 
ones. (As a rule, this column is automatically inserted for the 
user in computerized estimation packaqes.) Note also that the 
variables may be included in any order, so that the partition need 
not occur after the first column. 

To this point it has been implicitly assumed that the model 
to be estimated is comprised of only one equation. This need not 
be the case, althouqh, as will be demonstrated later, more equa- 
tions do introduce additional problems. However, in such situations 
subscripts can also be introduced on the dependent variables (i.e., 


Vio Vou suns Yg) and the list of independent variables expanded to 


include all independent variables in the model, say X2, Xs, ... 5 
Xx ° 2/ Then a [nx(G+K-1)] partitioned matrix of the form 
[YiX] 


can be constructed as the data matrix for the multiple equation 
model which is to be estimated. The same procedure is also valid 
if, for some reason, there is more than one data measure of the 


dependent variable y in the single equation model. 


2/ Recall that x, is a vector of ones and is inserted automatically. 


10 


The two most common methods of constructinq data matrices 
will be illustrated here. Most data required by economic analysts 
is available from such periodical publications as the Bank of Canada 
Review and Statistics Canada's Canadian Statistical Review. Since 
much of the data published in these sources is required by many 
users, Statistics Canada has assembled many of the time-series into 
a data base called CANSIM (Canadian Socio-Economic Information 
Management).* A subset of this data base (called the CANSIM Mini-Base) 
consisting of the more generally useful series is available in the 
public libraries of I.P. Sharp Associates Limited and may be accessed 
directly using codes that are available in Statistics Canada's 
Summary Reference Index. Alternatively, the required data may be 
manually entered and reshaped into the required form. An example of 


each method follows. 


2.2 Accessing the CANSIM Data Base 


Suppose it is necessary to obtain the Canadian population 
figures for the years 1966 through 1971. The CANSIM Mini-Base Directory 
indicates that the appropriate data can be found in series D 1. 
In order to retrieve this data, the user must bring into the active 
workspace those functions which perform the retrieval. This can 


be done by typing: 


)LOAD 81 CSUSAGE 


* Registered trademark of Statistics Canada. 


11 


All necessary functions are now available to the user. The function 
CSGET retrieves an entire series, but, in this case, only a speci- 
fic portion of the series is required. The workspace: 81 CSUSAGE 
contains a function called EXTRACT which enables a user to align 


a series to any point in time. The function is invoked as follows: 
R+A EXTRACT X 


where R will contain the series X aligned according to the para- 
meters A. The first 10 elements of all series are a description 
of the series, containing such information as first and last years 
of the series, the date of the last update to the series, whether 
the series is annual or quarterly, etc. The first 10 elements 
are not required in the population example; the drop operator 
'4' will, be. used Ep remove them. | 

The function EXTRACT is useful for this example. To retrieve 
the Canadian population figures from the first quarter of 1966 to 


the last quarter of 1971 type: 
POP+1041966 1 1971 4 EXTRACT CSGET D 1 


APL always reads from right to left; thus, the function 
CSGET executes before the function EXTRACT. As a result, the ` 
entire series is temporarily retrieved. The left argument 


1966 1 1971 4 Of EXTRACT then accepts only those elements of 


12 


the series required. Finally the first ten elements are dropped. 
Now 

pP0P 
24 
indicates that the vector POP has 24 elements (4 quarters for 
each of 6 years). The series must now be collapsed from a quar- 
terly series to an annual series. This is done using a function 
called COLLAPSE. 

POP*«(4 COLLAPSE POP)+4 

POP 
19997.5 20363.75 20692 20994.25 21287.5 21562 
The vector POP now contains the required six annual Canadian popu- 
lation figures. The following section demonstrates the facility 
with which such vectors can be reshaped into data matrices which 


can then be used as input to the estimation procedure. 


2.3 Constructing Data Matrices Using APL 


Suppose the annual population data from all 10 provinces 
has been retrieved from the CANSIM data base for the years 1966 
through 1971, and that the population vectors for each of the 


prairie provinces are defined as follows (see Table 1): 


13 


MAN 

963 963 971 979 983 988 
SASK 

955 957 960 958 941 926 
ALTA 


1463 1490 1524 1559 1595 1628 


To shape this data into a matrix called PRAIRIES, the p 
operator is used; to the leftof p is placed the shape of the 
desired matrix with the number of rows first and the number of 
columns second; to the right of p are placed the strings of 
numbers (vectors) which are to be used in filling that shape in 


row-wise order. Thus the statement: 
PRAIRIES<3 6 pMAN,SASK,ALTA 


defines PRAIRIES as a matrix with 3 rows and 6 columns. The first 
row contains the population data for Manitoba, the second the data 
for Saskatchewan, and the third the data for Alberta. To confirm 
this type the matrix name: 
PRAIRIES 

963 963 971 979 983 988 

995 957 960 958 941 926 

1463 1490 1524 1559 1595 1628 

If it is required that the data for each province be put 

into columns instead of rows, the matrix PRAIRIES could be trans- 


posed (@), making columns into rows and rows into columns: 


14 


XPRAIRIES 
963 955 1463 
99 957 1490 
971 960 1524 
999 958 1559 
983 941 1595 
988 926 1628 
The dimension or size of the matrix PRAIRIES can be ascer- 


tained by typing: 


oPRAIRIES 


The response indicates that there are 3 rows and 6 columns. An 


inquiry as to the dimension of the transposed matrix yields: 


op PRAIRIES 


The transposed population matrix can be reassiqned to a 


—— y 


matrix TPRAIRIES: 
TPRAIRIES<QPRAIRIES 


The row sums of this matrix yield the population totals of the 


three provinces for each year. To obtain these totals, type: 


+/TPRAIRIES 
3381 3410 3455 3496 3519 3542 


15 


To derive the same information from the matrix PRAIRIES, 
the sum must be computed across the first dimension (rows) as 


follows: 


+/[1] PRAIRIES 
3381 3410 3455 3496 3519 3542 


Now suppose that the corresponding population vector for 


Ontario has been retrieved (see Table 1): 


ONT 
6961 7127 7262 7385 75514. 7703 


To append onto 7PRAIRIES the column containing population figures 


for Ontario, type: 


TPRAIRIES*TPRAIRIES ,ONT 
TPRAIRIES 

963 955 1463 6961 

963 957 1490 7127 

971 960 1524 7262 

979 958 1559 7385 

983 941 1595 7551 

988 926 1628 7703 


To append a new row onto the matrix PRAIRIES, it must be 
specified that the row is being added along the first dimension, 
as follows: 


PRAIRIES<PRAIRIES,LU1] ONT 
PRAIRIES 
963 963 971 979 983 988 
955 957 960 958 941 926 
1463 1490 1524 1559 1595 1628 
6961 7127" 7262 "7385 7551 ©7703 


16 
Entire matrices can be joined in this way. Suppose the 
matrix of data for Nova Scotia and New Brunswick has been cons- 


tructed as: 


NSNB 
756 760 767 775 782 789 
617 620 625 628 627 635 


and it is desired to append onto this matrix one containing data 


for Prince Edward Island and Newfoundland defined as: 


PEINFLD 
109 109 110 ITL 110 112 
493 499 506 514 517 522 


This can be done by defining a matrix MARITIMES as: 


MARITIMES<NSNB ,{1] PEINFLD 
MARITIMES 
756 760 767 775 782 789 
617 620 625 628 627 635 
109 109 110 111 110 112 
493 499 506 514 547 522 


or, analogously, 


TMARITIMES<(& NSNB) ,QPEINFLD 
TMARITIMES 

756 617 109 493 

760 620 109 499 

767 625 110 506 

TLS 628 111 514 

792 627 110 547 

789 635 442 522 


When a matrix such as MARITIMES above has been formed, it 
is often necessary to select particular rows or columns corres- 


ponding to particular provinces or particular years. This can be 


17 


done by using indexing. When referring to matrices, two subscripts 
must be used: the first subscript refers to the row(s) desired, 
the second to column(s). The two subscripts must be separated by 
a semi-colon. Omission of a subscript implies that all of the data 
is desired. Thus: 

PRAIRIES[1;] 
99 963 971 979 983 988 
requests the first row of the matrix PRAIRIES (all columns), i.e., 
the vector for Manitoba, whereas 

MARITIMES 31] 
756 617 109 493 
yields the first column of MARITIMES (all rows), i.e., the year 
1966. To obtain the first two observations for Newfoundland (in 
the fourth row of the matrix MARITIMES), type: 

MARITIMES{T43;1 2] 
493 499 
Attempts to index with subscripts out of the range of the specified 


matrix result in the message INDEX ERROR. A summary of this data 


is presented in Table 1. 


18 


S8TZ 
EA: 
0902 
€002 
Sheet 
VER 


od 


8c9OT 
S6ST 
6SST 
HOST 
O6nhT 
egnt 


VWLIV 


926 
Lnp 
8S6 
096 
ER 
SS6 


ASUS 


886 
DEL 
6¿6 
TL6 
€96 
€96 


NEW 


COLL 
TSSL 
S8EL 
EI 
Lote 
1969 


LNO 


8209 Seg 
€tog9 L429 
S86S 829 
8c6S SzZ9 
hH98S 029 
T8LS Lt 
ano aN 
ƏdUTAOITd 


684 
c8L 
SLL 
LOL 
09¿ 
9S4 


SN 


ç TT 
OTT 
TTT 
OTT 
60T 
60T 


Idd 


¿S TL6T 
LTS 0¿61 
HTS 696T 
90S 896T 
66th HEEM 
E Gin 9967 
GIqN “Ieəx 


TL6T-996T “ƏocSurTAO43d Aq epeueD zo uorqeIndodq 


L ƏLlqel 


19 


2.4 Entry of Example Data 


Now suppose we are dealing with data that is not available 
in the CANSIM data base and therefore must be entered manually. 
For example, consider the armed forces, population, and gross 
national product (GNP) for 59 countries for the year 1969. This 
data matrix, illustrated in Table 2, will form the basis for the 
examples presented in later chapters. Countries are ranked by 
population size, which is measured in millions of persons. The 
size of armed forces is measured in thousands of persons. The 
data is to be shaped into a matrix of 59 rows and 4 columns, the 
first column being an identification, the second the population 
in millions, the third armed forces in thousands, the fourth GNP 
in billions of U.S. dollars. The data can be entered by row or 


by column. To enter the data by row, proceed as follows: 


A+750 2780 80 550 930 42 244 2955 466 205 3161 932 
A+A,128 325 16 118 365 10 104 259 167 59 466 150 56 


A+A,390 109 54 413 82 51 506 140 39 33 1 36 155 6 35 
A<A,478 14 33 282 27 33 242 41 33 288 6 32 645 8 28 
424 264 9 28 ANS 2 22 433 2 24 238 11 21 93 67 20 481 
A+A,20 18 4814 3 17 129 32 16 50 3 14 523 5 14 413 3 14 
A«A,168 28 14 57 3 13 121 28 13 85 32 11 48 4 10 186 
AA,5 10 102 24 10 9522 9 159 8 9 149 8 9 95 3 -7 HY 
AA, 12 7 36 20 7 124 2 6 87 26 28 19 5 24 2 5 39 9-5 
AA, 45 14 h 41 10 3 67 0.2 3 75 5 3 13 5 3 16 2 2 42 
AeA 1.2 60: 4 2 45 2 245 2.2 26 1 0.3 4 2 


A<59 3 pA 


The first line above stores a string of numbers (called a 


vector) in a variable called A. Subsequent lines consider what 


CHINA 

INDIA 

USSR 

USA 
PAKISTAN 
INDONESIA 
JAPAN 

W. GERMANY 
U.K. 

ITALY 
FRANCE 
PHILLIPINES 
THAILAND 
TURKEY 
SPAIN 
POLAND 
U.A.R. 

S. KOREA 
IRAN 

BURMA 

N. VIETNAM 
YUGOSLAVIA 
CANADA 
ROMANIA 

S. VIETNAM 
E. GERMANY 
MOROCCO 
TAIWAN 

N. KOREA 
CZECHOSLOVAKIA 
ALGERIA 
NETHERLANDS 
AUSTRALIA 
MALAYSIA 
PORTUGAL 
HUNGARY 
BELGIUM 
GREECE 
BULGARIA 
IRAQ 
AUSTRIA 
SAUDI ARABIA 
CAMBODIA 
SYRIA 
SWITZERLAND 
TUNISIA 
FINLAND 
DENMARK 
NORWAY 

LAOS 

ISRAEL 

NEW ZEALAND 
LEBANON 
ALBANIA 
JORDAN 
SINGAPORE 
LIBYA 
MONGOLIA 
LUXEMBOURG 


Table 2- page 20 


POP 


750 
550 
244 
205 
128 
118 


O P= N N NN N Q ww FNN Qn O O AA + @ (O WO 


AF 


2780 
930 
2955 
3161 
325 
365 
259 
466 
390 
413 
506 
33 
155 
478 
282 
242 
288 
645 
161 
143 
433 
238 
93 
181 
481 
129 
50 
523 
413 
168 
57 
121 
85 
48 
186 
102 
95 
159 
149 


B PN pP. P P N OO 


21 


already exists in A, catenate a further string of numbers onto .. 


A, and store the longer resulting string back into A. The last 


line restructures the vector A into a matrix called A with 59 rows 


and 3 columns. 


To enter the data by column, proceed as follows: 


A+750 550 244 205 128 118 104 59 56 54 51 


39 36 35 


AGA, 59 99 39389 0228 28 22 23 22°20 28 27 TELE TE T4 
ACA, 14 13 13 1210 10 10 9 9 9 77766555 4 % 3 
AGA 9 9 22 2 2 2103 12780 "930.2955 3161 325. 265 259 
A+A,466 390 413 506 33 155 478 282 242 288 645 161 


A+A,143 433 238 93 181 481 129 50 523 413 
A+A,85 48 186 102 95 159 149 95 49 36 124 
Ass AB 41 67 75 13 16 42 60 15 15 26 1 80 
A+A,16 10 167 150 109 82 140 1 6 14 27 bi 
A<A,11 67 20 3 32 9 5 3 28 3 28 32 4 5 14 


Ae4.3 12 10 t 1 239 £ 9 14 20 D.2 56 S 2 3 EE 2 


A<Q3 590A 


1168) 57 121 
87 28 21 39 
42 466 932 
6 8 9 2 2 
22 BP 
aod 


In this case the final line restructures the vector A into 


a matrix with 3 rows and 59 columns, transposes it, and stores it 


in A. Thus A again is a matrix with 59 rows and 3 columns. 


2.5 Identification of Data 
To insert a column of row identification, type 


A«(59 19159),A 
pA 


59 4 


22 


This defines [59x1] matrix consisting of the numbers 
1 through 59, catenates the matrix A to this matrix, and stores 
the resulting matrix in A. The expression pA returns the shape 
of the matrix A: 59 rows and 4 columns. A is presented as Table 3 
at the end of this chapter. 

Note that the expression 159 returns for a result the numbers 
1 through 59. Any type of monotonic sequence can be generated using 
the operator i, called iota. For example, to generate the even 
numbers between 1 and 10, type 

2x15 
246 8 10 

Or, to generate the odd numbers between 1 and 20, in decrea- 
sing order, type 

 1+2x110 
19 47 15 18 14 9 7 5 24 

The operator reversal, ọ above, simply reverses the order 
of a sequence of numbers. ( is formed by overstriking the upper 
case O with the upper case M.) 

Note should be taken that there are two minus signs on | 
the APL keyboard. To denote a number as being negative, use the 


"high minus" (upper case 2). To change the sign of all numbers 


23 


in a vector, use the minus sign (upper case +). For example, 


get "sn Ss Ga 
- D -v 
q 2% ges 4,2 
=X 
"CS SS 2 4.3 


Column identification is handled as follows: 


POP+A[_;2] 

AF<A[ 33] 

GNP<A[ ul 

These three statements assign the second column (all rows) 


of A to the variable POP, the third column of 4 to the variable 


AF, and the fourth column of A to the variable GNP (see Table 2). 


2.6 Cross-Sections and Time-Series 


The data discussed in this chapter is presented in Tables 
land 2. The data in Table 3 will be used in subsequent chapters 
to illustrate the techniques under discussion. The correct parti- 
tioning of A will be demonstrated at that time. Note that this 
example is a cross-section of observations (countries) at a point 
in time (1969). The population example presented earlier in this 
chapter was a time-series for each province (over six years), but 
also contained data for each of the provinces for each year (i.e., 


six cross-sections). Consequently, although the data in Table 2 


24 


represents one cross-section on three variables, the population 
data presented in Table 1 can be considered as either time-series 


or cross-sections, or more correctly a time-series of cross-sections. 


OD A OO E Go N P 


O Hä N N NN OUUU E Oo o OO +J + +J (O (Q (O 


Table 3- page 25 


2780 
930 
2955 
3161 
325 
365 
259 
466 
390 
413 
506 
33 
155 
478 
282 
242 
288 
645 
161 
143 
433 
238 
93 
181 
481 
129 
50 
523 
413 
168 


PP kä N P ppp N CO OO 


26 


CHAPTER III 
ESTIMATION 


With the construction of the data matrix completed, the 
model can now be estimated (step (iii)). The most important re- 


sults are summarized in this chapter. 


3.1 Estimation by Ordinary Least Squares (OLS 


Consider the multivariate linear model (equations (5) to 
(7)) presented in Chapter I, where there is one dependent variable 
and k-] independent variables. It can be shown? / that with the 


following assumptions on the error term 
(i) zero mean: E(u)=0 
(ii) constant variance: SEN 
zero covariance: ? E(uu') = o I, 


where i. is an [nxn] identity matrix; and on the matrix X : 


(iii) non-stochasticity: X is fixed 


(iv) sufficient information: X has rank ken 


minimizing the sum of squared residuals denoted SSR (or RSS) (hence 


the name least squares) 


1/ See Johnston, pp. 122-126. 


27 


n AN AN AN 
SSR = = ei = e'e = (y-X8)'(y-XB) = y'y - B'X'y 


(where e is defined in Chapter I, result (4)), results in the 


following estimates of the coefficients: 

8 = ORI RY (8) 
with variance-covariance matrix 

var (B) = o2(X'X)"! (9) 


(where a prime (') denotes a transpose and a superscript minus 

one (-1) an inverse). Note that moi has dimension [kxk] 

while X'y has dimension [kx1] , so that e is the desired 
dimension [k<1] ; that is, the k parameter estimates. Assump- 
tion (ii) includes two assumptions - homoscedasticity and no auto- 
correlation, while assumption (iv) requires sufficient observations2/ 
and also rules out exact multicollinearity. Assumption (iii) enables 
conditional estimates and predictions to be made based on the data 
X. This rules out errors in variable measurement. The implica- 
tions of violating these assumptions will be examined in Chapter VI. 


These estimates are best (minimum variance), linear, unbiased 


estimates (BLUE) .3/ Since oi is unknown, it must be estimated 


2/ A good intuitive general rule is n2l0k; that is, at least 
ten observations for each parameter to be estimated. Techni- 
cally all that is required for a result is nk, 


3/ See Johnston, pp. 125-127. 


28 


with the unbiased estimator, denoted S* , defined as 


52 = SÉ = SE a Le SSR), (10) 
S is called the standard error of the estimate, denoted SEE, while 
(n-k) is called the degrees of freedom (the number of observations 
minus the number of estimated parameters). SEE is measured in the 
same units as the dependent variable and therefore cannot be com- 
pared across equations estimated using different dependent variables 
or estimated over different sample periods. Sometimes for such 
comparison across equations a unit free measure called the coeffi- 
cient of variation for an equation can be used. It is only defined 
at some value of the dependent variable and the average (or mean) 


is usually used for this purpose, that is, 


SIE (11) 
y 
where y is the mean of the dependent variable in the regression.1/ 
CV is usually expressed in percentage terms. 
The standard error on each coefficient (denoted gd is 
obtained by substituting S? into the variance-covariance matrix 


(9) and taking the square root of the k (variances on the) dia- 


gonal elements (including the intercept). Traditionally, these 


4/ This is analogous to a CV for a single variable which divides 
the standard deviation of the variable by its mean. 


29 


are then used to calculate t-statistics for each coefficient, which 
test the null hypothesis2/ that the true coefficient is zero; that 


is 


against the alternative hypothesis that it is not zero 
Hy: B; # 0 


or that it is positive (Hı: Sal or that it is negative (Hı: B;<0). 
The former is called a two-tail test while the latter two are called 
one-tail tests. This t-statistic, with (n-k) degrees of freedom, 


is calculated as 
t, = a | (12) 
J 
where 2 is obtained from (8) and S; from (9) as outlined above. 
A high t-value leads to a rejection of the null hypothesis in 
favour of the alternative. This will be discussed in more detail 
in Chapter V. 


The total sum of squares of the dependent variable (or 


squared deviations around the mean),denoted TSS, is obtained as 
TSS = K. (yy) = y'y = ny? (13) 
5/ Hypothesis tests are discussed in detail in Chapter V. This 


test is based on a fifth assumption that the errors (u) are 
normally distributed. 


30 


and therefore the total sum of squares (TSS) minus the residual 
sum of squares (pes can be defined as the sum of squares ex- 


plained (ESS) by the regression, or 


ESS = TSS - RSS(or SSR). 


The coefficient of multiple correlation or R-squared is defined as 


the proportion of the TSS explained by the estimated equation, or 


ESS 4 BER EE 


2 B e Wa qp e eaa 14 


Since R2 is a proportion of sums of aginres it can never be 
negative and will always be between O and 1 assuming an intercept 
is included in the regression. If an intercept is not included 
difficulties can arise. Z Because RSS and TSS are not unbiased 
estimates of the variances of the error (numerator) and dependent 
variable (denominator) respectively, an R* adjusted for degrees 


of freedom, called R_bar-squared, is calculated as follows: 


Re = 1 - Enn, (15) 


Note that R* will never decrease as additional independent vari- 


ables are added to a regression, whereas R* will decrease if 


6/ Note that € = 0, so x (e; - €) = > ef = e'e (RSS). Note 


Johnston, footnote 1, p. 129. This is the number that is 
minimized by the least squares estimation procedure. 


7/ See footnote 11, Chapter IV. 


31 


the t-statistic on the variable is less than one in absolute value. 
In other words, the addition of a new independent variable will 
almost always increase R2 , but will only increase R2 if it has 
a |t| value greater than one.Š/ Note that this result only con- 
siders one variable at a time, so if a variable is eliminated on 
the basis of a KIM , the equation must be re-estimated before 
judgement can be passed on the remaining variables. In all cases 
R*>R*. Note also that R? can be negative, but R? cannot (as 
long as the relationship contains an intercept term; again see 
footnote 11, Chapter IV). 

The division of the total sum of squares (TSS) into an 
explained and an error component can be extended even further, 
since within the explained component the source and contribution 
of each of the independent variables can be clearly identified 
and, therefore, so can subsets of independent variables. This is 
called an analysis of variance.2/ Each successive entry shows the 
increment in the explained sum of squares due to the addition of the 
new variable given that the preceding variables have already been 
included. Remember that the TSS refers to deviations of the dependent 
variable around its mean (see (13)) and not just y'y (= by?) ; 

These calculations are the ingredients of further Fest of 


hypotheses involving the F-statistic. The most common of these 


tests is a test of the overall relationship excluding the intercept; 


8/ See Y. Haitovsky, "A Note on the Maximization of Ron; The 
American Statistician, Vol. 23 (1969), p. 20. 


9/ Johnston, pp. 143-146. 


32 
that is, testing the null hypothesis 
H E 82 w 4° 6, = D 
against the alternative hypothesis 
Hı: any B; #0 TOr jJ = 2, sae s K a 


It can be shownt0/ that the appropriate F-statistic with (k-1) 
and (n-k) degrees of freedom can be calculated from R2 using 


the formula 


F[(k-1), (n-k)] = HAUT e, (16) 


Note that as R? approaches 0 , so does F , and a high R? 
results in a high F . Note also that the degrees of freedom for 
the numerator (k-1) is the number of constraints which are imposed, 
which is the number of equal signs in Ho . If the calculated 

F exceeds the critical value - as it nearly always does - the null 
hypothesis is rejected in favour of the alternative (see Chapter V 
for details). Note that this is nothing but the ratio of the ex- 
plained sum of squares to the residual sum of squares adjusted for 


the degrees of freedom. Similarly, this approach can be extended 


to cover any subgroup of coefficients. By taking the ratio of 


10/ Johnston, pp. 142-143. 


33 


(say) the sum of squares associated with the last (k-F31) varäabies 
divided by (k-r) to the sum of squared residuals divided by (n-k) 
(i.e., S2), an F distribution with [(k-r), (n-k)] degrees of 
freedom is obtained. When the degrees of freedom for the numerator 
is reduced to 1, the F-statistic is equivalent to a t-statistic 
squared. To understand this note that the null hypotheses A 
become the same. Consequently, the traditional t-test of a regres- 
sion coefficient (described above) is equivalent to testing whether 
the addition to the explained sum of squares due to adding the 
single variable X; is significantly large in relation to the sum 


of squared residuals given the other independent variables in the 
equation. 


Finally, a useful test statistic for assessing the degree 
of violation of the autocorrelation aspect of assumption (ii) is 


the Durbin-Watson statistic, denoted d and defined as 


n 
Ey e fd 
7 


(Since this statistic usually only makes sense for time-series data, 
it is often defined using the subscript t .) This and the other 


hypothesis tests will be discussed further in Chapter V. 


34 


The information in a data matrix can be summarized in a 
correlation matrix, which is a square matrix. If the dependent 
and the (k-1) independent variables are all included the matrix 
is [kxk] . If this data is arranged in a partitioned matrix of 
the form [y:X] (that is, with the dependent variable first), 
the simple correlation between the dependent and any independent 
variable (x5) can be defined as 
Z(x;-x;) (y-y) 


(SEW 


1j pen z deg ie 


y ie ww E rr i 
Bit z(y-y) 


where x; is the mean of the X; series and the summations are 
over i (i = 1, 2, ... , n) observations. This information will 
then be presented in the first row and/or column of the correlation 
matrix. The same definition holds for simple correlations between 
any two variables and can be computed by replacing y by Xk (say). 
From the formula it should be clear that if y is replaced by 
X; , then Pai ST: a lg X; is replaced by y, rii = 1.) 

Any variables can appear in any order in a correlation matrix - the 
choice of both is left to the user. Note that if only one inde- 


pendent variable (x;) appears in the equation, R2 = df . The 


11/ rıı is the top left hand element of the correlation matrix, 
since it is in the first row and the first column. 


35 


significance of a simple correlation coefficient can be tested 


with a t-statistic with (n-2) degrees of freedom computed as follows JE 
rvn-2 


j): 


Each of the above results can be computed easily using 


where r is any simple correlation coefficient (r; 


APL; all are available in a multiple regression program on the 
APL system of I.P. Sharp Associates Limited. A discussion of the 


output from this program is presented in the following section. 


The REGR Program 


The SHARP APL program (function) which performs a simple or 
multiple regression analysis is called REGR. In order to execute this 
program, the user must first make accessible in the active workspace 
the program and its required sub-programs (other functions essential 
to a successful execution of REGR). Workspaces are the blocks of 
storage into which the SHARP APL public libraries are organised; the 
user must access that storage block which contains the REGR program. 


This is done by typing 


)LOAD 32 REGRESSION 


12/ Johnston, p. 36. 


36 


At this point, all required functions are available to the user, but 

the data upon which the analysis is to be performed is not. The data 
matrix should now be either constructed manually, as described in 
Chapter II, or copied from another existing workspace (block of storage). 
The latter can be accomplished by means of the )COPY command. For 


example, the statement 
)COPY DATAWS X Y 


serves to copy the objects X and Y from the (previously) stored workspace 
called pATAWS. These variables are then immediately accessible to the 
user in the active workspace. 

The REGR program can perform a simple or multiple regression 
analysis. If the equation specification to be estimated is to contain 
no intercept (constant term), so that the ¿stftgkted regression line is 
constrained to pass through the origin, an additional state setting must 
be invoked prior to the execution of REGR (see section 4.15). 

The general form of the statement required to actually execute 


the program is 
Y REGR X 


where Y contains the values of the dependent variable and X is a matrix 
whose columns contain the independent variables. Each column in X 
represents a variable such that the first variable to be entered into 


the regression equation is in the first column, the second variable to 


37 


be entered is in the second column, and so on. There is no need to 
specify a column of ones in X since this is performed by the program; 
thus, X is of order [nx(k-1)]. Note that in the case of a simple 
regression, when the equation contains only one independent variable, 


X is of order [nx1]. As an example, consider the statement 
DATAL;1] REGR DATAL;2 3] 


This initiates a regression of column 1 of the matrix DATA (the depen- 
dent variable) on columns 2 and 3 (the independent variables, or 


regressors). Similarly, the statement 
AF REGR bi 59pPOP,GNP 


performs the regression of the variable AF on the variables POP and GNP. 
For a discussion of the meaning of &2 59pPOP,GNP see Chapter II. 

Certain sections of the results produced by REGR have been made 
optional outputs since they contain non-essential, although very useful, 
information. These optional features include an analysis of variance 
table, a correlation matrix, a list of residuals, and a variance- 
covariance matrix of estimated coefficients. The user who does not 
require this information for the particular regression being estimated 
can benefit from the accelerated print time and more compact results 
remaining following the suppression of these outputs. 

The printing or suppression of optional output features is con- 


trolled by state setting functions, as is the exclusion of an intercept 


38 


from the regression specification. The current state may be displayed 


at any time by typing 
STATE 


Values of all state settings can be restored to their default values 


by typing 
DEFAULT 
These default values are 


CONSTANT 

ANOVA 

CORRELATION 

NORESIDUALS 

NOCOVARIANCE 
indicating that the analysis of variance table and the correlation 
matrix are to be displayed, but that printing of the residuals and the 
variance-covariance matrix of the estimated coefficients is to be 
suppressed. An intercept term is assumed to be included in the specifi- 
cation, although it can be eliminated (see Section 4.15 below). Results 
produced by the various settings of the state are described below. 

The setting ANOVA (which has the suppression alternative NOANOVA) 


results in the calculation and printing of the following analysis of 


variance table: 


39 


SOURCE OF DEGREES OF SUM OF 
VARIANCE FREEDOM SQUARES 


F-STATISTIC 


Mean 


Regressor: 


Residual 


TOTAL 





where e'e is the sum of squared residuals (SSR or RSS), ESS, is the 
incremental sum of squares explained by the addition of X; to the 


equation given that x through A are already included, and F; is 


SS 
FI, n-k] = eener for’ ` ag 2yr y k 

with 1 and n-k degrees of freedom. (The associated hypothesis 
tests are discussed in Chapter V.) Recall that TSS = y'y - ny? 
(result (13) above); consequently R? (result (14)) and, there- 
fore, R2 (result (15)) and F (result (16)) can be easily calcu- 
lated from this table. 

The F; printed in the last column are called sequential 


F-statistics, i.e., F, provides a test as to the significance 


40 


to the regression of the independent variable x, given that the vari- 
ables x2 and x, have already been included in the regression equation 
(recall that x, is the column of ones representing the intercept term). 
Thus the sequence of columns in the input matrix of independent variables 
affects certain of the results in this analysis of variance table, although 
it has no bearing upon the estimated coefficients or the resulting equation 
statistics. 

The setting CORRELATION (which can be negated by the alternative 
setting WOCORRELATION) produces a matrix containing the results presented 
in (18) and (19) above, with the simple correlation coefficients (result 
(18)) in the upper triangular portion, and the associated t-statistics 
(result (19)) in the lower triangular portion. The first row and column 
refer to correlations involving the dependent variable (see footnote 11). 
This matrix is useful for assessing the sign and strength of the correlation 
of the dependent variable with each of the independent variables (separately) 
as well as the intercorrelations among the independent variables. The latter 
are important with respect to the problem of multicollinearity (see Chapter 
VI). 

The fourth default setting of the state is WORESIDUALS. Invocation 
of the alternative setting RESIDUALS produces the information in result (4) 
above at the end of the computer output for the estimated regression; i.e., 
the [nx1] vectors y, J, and e (= y- y) are printed as adjacent 
columns in a table. This information can be useful in determining which 
observations are not being predicted accurately (those with large errors), 


and is also pertinent to the problem of heteroscedasticity (see Chapter VI). 


41 


The fifth and final default state setting of NOCOVARIANCE is 
nullified by the alternative setting COVARIANCE. The latter produces a 
symmetric [kxk] matrix (result (9)) which is printed following the re- 
gression output, but preceding the list of residuals. The diagonal elements 
are the variances of the estimated coefficients (intercept term first) 
while the off-diagonal S;; for iZ j is the covariance between 
coefficient i and coefficient j . Most analysts will probably not 
require this information, since its main utility lies in the performance 
of more complicated hypothesis tests concerning possible relationships 
between the estimated coefficients (see Chapter V). 

In addition to the four optional output tables described above, 
the REGR program prints a variety of standard regression statistics. 
Regardless of the settings of the state, immediately upon execution of 


the REGR program the user is reminded to 
ALIGN PAPER 


to facilitate future use of the output. Pressing the RETURN key then 
produces the standard regression output described below, as well as any 


optional output features dictated by the settings of the state functions 


described above. 


The first piece of information printed is the mean of the 


dependent variable (y). This precedes the following table: 


42 


VARIABLE MEAN ESTIMATED STANDARD t-VALUE 
COEFFICIENT ERROR 





for j=1, 2, ...., k. This includes the information in results 
(8), (9) and (12) described above. 

This table is followed by the analysis of variance table 
(when requested), and finally by the pertinent statistical information 


concerning the regression: 


R2 (result (14)) SEE (result (10)) 
R° (result (15)) d (result (17)) 
F (result (16)) CV (result (11)) 


Recall that R2 is a proportion, SEE is measured in the same units as the 
dependent variable, and CV (calculated at y) is expressed as a percentage. 

One final caveat should be mentioned before terminating the discussion 
of the use of the REGR program. The error message DOMAIN ERROR is printed 
when the program attempts to invert a singular matrix. This occurs whenever 
the same independent variable has been included twice or, more generally, 
whenever any of the independent variables is an exact linear combination of 
any or all of the other independent variables. The econometric term applied 
to this mathematical difficulty is the problem of multicollinearity (see 


Section 6.2). 


43 


3.2 Summary of REGR Output 


In summary, six main pieces of information are available 


from the REGR program: 


1.* correlation matrix 
estimated coefficients, etc. 
.* analysis of variance table 


equation statistics 


ao + Ww N 


* variance-covariance matrix 


On 
* 


estimated residuals (errors) 


with the starred (*) items optional and displayed dependent upon 
the selected settings of the state. The most important of the 
information is contained in items 2 and 4 above. The information 
contained in items 5 and 6 can be retained in the workspace for 
future use by assigning the variables ASE and AF respectively 


to some matrix. For example, 
ERR<AE 


stores the estimated residuals in an [nx1] vector called ERR. 
These may be useful for later work 13/ and as an input for plots 
(for a discussion of the APL plotting package, see the manual 


SHARP APL Plot Facility, available from I.P. Sharp Associates). 


13/ For example, see Chapter VI, where autocorrelation and hetero- 
scedasticity are discussed. 


44 


3.3 Examples 


Consider the sample data matrix A, presented at the end 


of Chapter II, with column mnemonics 
ID, POP, AF, GNP 


representing identification, population, armed forces and gross 
national product, respectively. The observations (i = 1, 2, ... ,59) 
represent 59 different countries in the world. 

Suppose that the problem is to explain the size of the 
armed forces (AF) of different countries of the world. AF then 
becomes the dependent variable in the regression. Further, suppose 
that the analyst believes that a linear specification is appropriate 
and that at least one explanatory variable (regressor) is the size 
of a country's population (PoP). It is to be expected that a 
positive relationship exists between the variables AF and POP, 
since a larger country (in terms of population) would have more 
people available to enrol in the armed forces. Thus, it is expected 
that Bj should be positive. 


Complete results are shown in Example 1. Note that 


(i) Zei 


(0.72176)2 = 0.5209 = R2 since there is only one 
independent variable. 


2 
(ii) t, = (7.873)? = 61.9842 = F} since the numerator of F> 
has only 1 degree of freedom. 
(iii) tz = the t-statistic in row 2, column 1 of the correlation 


matrix, since the two statistics are in fact testing 
the same null hypothesis. 


Lënntaapcaregt 
SS91201IS69°LI 
THLREHSOTI°6hh 
ZOOSELTHBE° TI 
6TTZONSCTS°O 
OTE6ShHHEOZS’°V 


O29n9°S6HTZOC 


LTH86°T9 962nL°SLOOESZT 
ITsSILVES-d FYVNGS 92 
OOEL8°L SO8Lh*O 
TEINS? OcTOE* ES 
GNTVA-S d0dd4a °dsS 


SGUAIIIAIION Q3ZSVHIZS3 40 FIRE 


Ge A JO 


S82Z2°0 
2802S°TT_ 


90ZS TE 
6ThO'LCOn 
FONVIXVAQO-JONVIUVA 


NVGN JEL GV) NOISVIXVA dO JNITITAIAOD 


eg ee ee eS eee ae s ee 


IOS s e sus s sus SSS s SSS CY EE Ge FHT AO YOdd7 


CLS SF 


GxVaNvs 


\NOISSHY9IY JO JONWITAINDIS JOT JILSIZYZS- S 


E E Een ¿xg qz¿23g 420202 


pis SS ee s ms E SC eA ITI 


COO000*SOSZSSOE 
TEEEB*ECS72STT 
967nHL*°BLOIESCT 
CLETH® 9686649 
SSHROOS do HAS 


B9E9L°E 
ZTETBT ZHT 


SHATITAIION Ia@SvVALs 


SESTO TEE 


g34209 NOIZVITZYHJO09 JTAIZTN 
63 TV¿0J 
ES TVAGIS2d 
I T X:20SS34933H 
T NVIH 
dd HOILVIXYVA JO FJIUNOS 
98TTH°OS T 
Ada LVISHOJ 
A Hä ZT8VIHVA 
3T8VIHYVA Z342q22d42q dO AVA 
00000° 1 OOEL8°L 
ILTZE O 00000°T 
(SGATVA-25 HIZIN) XTXLVA XOILVTAXYOD 


ddid¥d NOITV 
dOd Y42484 dy 


FINVIYVAOION SYN 
JINVIXVAOI 
GINVIXVAOIO!N 
STVAGCISZYON 
HOIZYTJYYO0I 
VAONV 
HAS 


46 


(iv) F2 = F since there is only one regressor. 
(v) TSS y'y - ny? 

30,552,505 - 6,499,896.4 
24 ,052,608.6 

ESS, + RSS (e'e) 


Also note that Bo >0 » confirming the prior beliefs. 

Now, to illustrate the output for the multiple variable 
case (Example 2), suppose an additional explanatory variable, 
the size of a country's GNP, is introduced in an effort to help 
further explain the size of a country's armed forces. Presumably 
the population variable is designed to capture the ability to find 
people for the forces, whereas the GNP variable is designed to 
capture the ability to finance the forces. Thus, it is to be ex- 
pected that each of the estimated slope coefficients will be positive, 
since an increase in either population or GNP will probably result 
in an increase in size of that country's armed forces. 

The estimated coefficients shown in Example 2 are both 
positive, confirming the prior beliefs. A comparison of the com- 
puted statistics in Example 2 and the formulae given in results 
(11) through (16) above might prove beneficial as an exercise. 


For example, note that 


CV = l00xSEE = 100x229-39! 


y 331.915 


= 69.11 (percent) 


A discussion of the appropriate hypotheses which may be tested 


from the results in Example 2 are: presented in Chapter V. 


N 
St 


= 
w 
si 


HL ei 


e oi © 

C çN sei ON L) 
C 

CE Qy C SE ei 


= St Gs G, E= 
tN St oh QQ 
E GP St e en 
oat & 

D ei ¿Q Oo + 
rr GH CES fo 
~ en CG 
~C~ = 


a 


CH CH 
LO CH 
CN Hi 
CH E 
ei OC 
D 
CO CH 
Om Ww 
(N ei 
S N @ 


LESSL ¿T 
Z290T9° OT 
LG02SL Tl 


. 
(Co) 


ON 
NN 


H 
@ O OCH OO 


SH0HS0° G 061T0°0_ Gens T _ 


betes. 7990°O Coon" Z _ 


ech fa Beat 2 _ 


S8LL°980T 
3202 QEEVHIES2 dO XIUSVH AONVINYAOI-ZONVINVA 


ae E JO NVL ABRG GV) EE Ze JIITITALAIOI 


( 


° oO 
° vn 


G9000° 


Onte 


Lo: LS E 
360nL° 
ELETI” 
sSzuZzyno5s 


1S036° 
¿hL L 
GË L LS 
J3IJIddł02 d 


. 
oo rwow 


Z 
ZC 


S¿S716°T1€€ 


*easasasass isisa s OC SIÄ 
TTT AT Re Ge ee QJ 7GH TE 
)WOISSEdOZU JO ZINVITAINDGIS JOT IJISSIL7IS~- 3 


paca Mek e 


€ 
BR WK e és ee Wee WS keete AE D) bé EI Ga agIgIEHYO9) 
smese ot gagJ) 222ZF9T34200 NOTLVIZMHOD ATdTLTAN 
S¿5S0€ 6S TVL08 
L9n5Z 9S TVAGISAY 
LSLS8 L Ç X 
COesSZT L T X:40SSH20292Z1 
866"n9 T AVIN 
NAS ad NOIZYIYVA dO dOznOS 
S8CÇCS8°Sh Z 
98TTn° OS I 
HAIL SNIVSSNOD 
Zull NV ae J3T87IH7A 


3gT8g8VIHVA 2il3dgigdEZd dO AVIR 


60000°1 Z S:¿ TS ë 999248°6 
9LSTE*O 00000°T OOELB°L 
Lan6Ll°o 9LTCL*O 00000°T 


(SZATYA-5 HSIM) XIYZVA NOITSVTIAYOD 


UIdVd NOITV 
dI9*°JOde¢6S Z% 4944 dv 


INVI YAOI 
STVndISZHON 
MNOILVT32HO009 

VAONT 
dLViS 


48 


3.4 Estimation with Large Samples - The REGRESS Program 


As mentioned in Section 3.1, the public libraries on the SHARP 
APL system are organised into workspaces, or blocks of storage. Each 
workspace is of the same finite size, namely approximately 100,000 
bytes of addressable storage in the memory of the computer. Each data 
matrix which the user chooses to save in a workspace fills up a portion 
of the total bytes (100,000) allocated to that workspace. Eventually, 
given sufficient data, the amount of space remaining in the workspace 
becomes so small as to render program execution impossible, i.e., 
100,000 bytes is no longer sufficient to contain the original data, the 
necessary programs, and any intermediate results produced temporarily 
during the execution of the programs. An attempted program execution 
under these circumstances results in the error message WS FULL. As a 
result, the REGR program is constrained as to the number of observations 
available for analysis. 

To obviate this possible difficulty an additional workspace has 
been created in the SHARP APL public libraries containing a function to 
perform simple or multiple regression analyses on data matrices contain- 


ing any number of observations. To access this workspace, type 
)LOAD 32 FILEREG 


The program which performs the analysis is called REGRESS, and has been 


designed specifically to handle those situations in which there are too 


many available data points for the REGR program. 14/ Atl input data is 


14/ Note that the REGRESS program should be used only when the volume 
of data is quite large. The REGR program described in Section 3.1, 
should be used as an alternative whenever possible. 


49 


stored in a SHARP APL file. As a result, use of the program requires 

a certain familiarity with the SHARP APL file subsystem. (This necessary 
familiarity can be easily achieved by reference to the "SHARP APL File 
Subsystem Instruction Manual", available at no charge from I.P. Sharp 
Associates.) The data may be stored in any number of matrix components, 
each'of which may have any (possibly varying) number of rows, provided 
that the number of columns in each component remains constant. The 
columns represent the variables; the dependent variable may be in any 
column. 


The program is executed by typing 
REGRESS N 


where w is the number of observations upon which the analysis is to be 
performed. If there are M observations in the file and an analysis of 
N<M is requested, the first N observations of the existing M are 
processed. If N>M, an analysis is performed on the existing M and 
an appropriate message follows the output. 

The following input is requested: 

(1) The tie number of the file containing the data. 

(2) Column numbers in the data matrix of the dependent and independent 
variables. These numbers must refer to the column numbers in the 
transformed matrix if any transformations are performed. These 
numbers may not correspond to those in the original input data 


matrix (see below). 


50 


(3) Whether or not any transformation is to be performed on the 
data (see below concerning the transformation subfunction). 

(4) Whether or not the Durbin-Watson statistic is required. It 
should be noted that its calculation necessitates a second 
pass through the data file, thereby increasing expense. Its 


inclusion in the output should be requested judiciously. 


An analysis can be performed on transformations of the original 
input data by first defining a specific transformation subfunction 
called TRANSFORM. This subfunction must be niladic (i.e., have no 
arguments), and return no explicit result. It must assume that the 
data component from the original file is called xX and must ensure 
that the resulting transformed matrix is also called xX. For example, 
suppose that the original data components consist of four columns and 
that column 1 (the dependent variable) is to be regressed against 
column 2 and column 3 times column 4 . The necessary subfunction 
would then be 

VTRANSFORM 
[1] X033)+X0 339X034) 
[2] X+0 “14XV 
Note that in this situation the dependent variable is still in column 1, 
but that the required independent variables are now in columns 2 and 3 
of the transformed input matrix. 
Following a reminder to ALIGN PAGE, a depression of the RETURN 


key results in the printing of the results illustrated on the next page. 


51 


B88E6S°LS9S 
Ə21I¿;SIL¿VaS i 


HESht’st_ 
SSSZT°HS 
COTOT HY 
Z8E80°02 
dN TWA-d 


¿£083S6c3€ °249 
LUCEYTSHOB*O 
eSLnnLdnes°o 
hHELGHISEIS* LO 
866196¿8€6S°¿S 


H£€£€9°¿80€h 
S6O069°CHOHRLERS 
G4VNCS KV 
ELENG 
LSLE0°O 
£S0hn0°0 
cnHOTS*Y 
aa “di 


seeesee’ (4 AO HRäe did y) NOILVIXVA dO LUAIITAIIOO 

Nee Ror E ie aa kaba Beer Seale son) FEI aaroo 

s. e *eesesesesse(Gsxë) GKHI2123309 ROLLVTSHHQ029 ZIdIL TANG 

< "TTT TTT HARSIÄSS HL dO dOWHuZ quvyquVis 

3S (966T*€ LGE GO AIUVITATHYIS dOd IISSILVIS-a 

00000°6L98LLZEOT 000 i TVLO0¿ 

H9ZhHT”9E6Z66S8 9661 Gg) 

9BCLO°SCTZESTEL € MOISSdHdIOZHE 

OShBL*HTEGESSTZZ L HVEN 

vedvNus do ANS id HOIEVIYVA dO 49YNOS 
COTEL SSSL SZhET’O E 
HICIT ES O6n86°Sh Z 
T8L6S’Z Sh8SS°0S T 

COhSL°'OET WAIL LUVLSNOD 

bid LITGHA09 Gad Vvil isd Vay gTagVIdVA 
OOSTEB°*ZEE ASTCRIAZRA LNaGGNad2d 30 HV3r/i 


d9Vd WOITTV 
AT 


¿(Oú dO SHX) qšuInosu IIDSILVIS NOSLVA-NIGUAC 


ar 
ie 


¿(ON ä0 S2i) GzZeINOZda NOTLVWNUOASNVEL 
Mineo 


Sr? 
bel Zi 


((S)#844nN RANTOJ) Seb/Id (S)ZT8VISVA LJNAGNACAANI 


š ' 
Laag HATOI) ASVATd 2Tg8VIZSVA LNIANAAAG 
OT 


ery 
.. 


J4gVaTd ATIA VLVQ ZO JAEWAN ara 
0007 SSHUDed 


OT ZIZO svdvdaeds 


52 


CHAPTER IV 
DATA MANIPULATION 


4.1 Introduction 


In Chapter II the entry of data, including the creation of 
an identification vector, was discussed. In this chapter a number 
of additional techniques for data creation and manipulation in 
APL are presented, using as examples a vector V and a matrix M 
defined as follows: 

V+1 74 9 3 10 

M<3 30765214235 
Note that if it is required for future use, any newly-created data 
must be saved; it should either be stored separately or appended 


to an existinq data matrix (see Chapter II for details). 


4.2 Addition of a Constant 


3+V 

bk Ap 7 12 6 A8 
1+M 
6 5 4 
T 0 S 
Tt 2 ry 


This operation is seldom used in econometric analysis. 


However, one such application might be to convert an identification 


53 


vector (1, 2, ...) into an identification referring to years (1947, 
1948, ...) by adding the constant 1946 (see section 4.11 below). 
4.3 Addition of Variables 


V+V 
2 14 8. 18 6 20 


Remember that it usually only makes sense to add variables 


measured in the same units. 


4.4 Addition of Matrices 


M+M 
14,612 10 
4 2 8 
4 6 19 


This is an unlikely operation in econometrics. 


4.5 Multiplication by a Constant (Scaling) 


2xV 
2 46 S 38 6. 20 
.5xM 
3.5 3 2 95 
4 0:5 2 
il 1,5 Zed 


This is an essential operation in econometrics since it 


enables the user to scale the variables and therefore the coefficients. 


54 


Note, for example that 


gx = (108) Ho] 


so that dividing a variable by 10 (or multiplying by 0.1) scales 

the regression coefficient 8 upwards by a factor of 10. It is 
essential that variables be scaled so that the estimated coeffi- 
cients obtained from the analysis are both meaningful and free from 
rounding error in the computer. For example, it is not good prac- 
tice to measure GNP in thousands of dollars and obtain a coefficient 
of 0.000000366, which when rounded to 0.000 for purposes of presenta- 
tion in a report becomes meaningless. Rather, GNP should be scaled 
downwards and measured in billions of dollars, so that the estimated 
coefficient is reported as 0.366. Consequently, note that the 
magnitude of an estimated coefficient is a function of the units 


of the associated independent variable. 


4.6 Multiplication by a Variable (Polynomial Regression) 


VxM 
RANK ERROR 

VxM 

A 

V+.xV 
256 

VxV 


A H9 -T6 Bt, 9 100 


55 


The first operation above results in a RANK ERROR, Since 
an attempt is being made to multiply a vector of length six by a 
[3x3] matrix, i.e., the two variables have different ranks. It 
is only possible to pre-multiply M by an [nx3] matrix or to post- 
multiply M by a [3xn] matrix, in which case the APL operator 
+.x would be used (rather than x). This operator is an example 
of an inner product, and is simply the APL formulation of the 
algebraic concept of matrix multiplication. It is illustrated in 
the second statement above, which returns the sum of squares of 
each element in V (e, AR These operations are of little 
use with regard to data creation and manipulation, but are essential 
to the computations performed by the APL programs. LI 

The third operation, however, is very useful for data creation 
since it allows non-linearities to be introduced into a linear model. 
For example, given an independent variable Xj , this operation 
permits the creation of 

2 
xj = (x;)(x;) 

ot higher powers of X; . When such a function of X; is intro- 


duced into a relationship, say 


2 
y = 61 + Bex, + Bak, + U 


1/ For example, in computing the sum of squared residuals and 
the variance-covariance matrix (see page 27). 


56 


the basic relationship between y and X; becomes quadratic in 
nature. The exact form of the relationship depends on the sign 
and magnitude of the estimated coefficients. Note, however, that 
this is still a linear regression, since by (arbitrarily) defining 


2 
2, = Mac wn that H$ 
J J 


ZX x2 
the relationship 
= + . + . tu 
y = Ba Bax; BZ} 


can-then be estimated. This is no different from having another 
(linear) independent variable. The basic shape of the relationship 
is dictated by the signs on Bj and £3 and the position by Bı. 


For example, consider the following diagrams: - 


By 





Diagram 1 Diagram 2 


57 


In diagram 1, Bu and ß are positive, whereas in diagram 2 
both B, and B, are negative. Note that the turning point 


(minimum or maximum) occurs when 


GL = Be + 2 Bax; = 0 


dx; 

or when 
gly Shey 
j 2B3 


Consequently, if the turning point is to occur in the first quad- 
rant (for positive X; , as illustrated) either B82 or B, must 

be negative (but not both). Of course there is no reason why the 
turning point must be in the first quadrant since only part of the 
curve may be relevant for the particular application contemplated: 
(such as to the right of point A on diagram 1). This type of opera- 
tion can be easily extended beyond the quadratic to approximate a 
polynomial of any desired degree. There will be a new independent 
variable for each addition to the order of the polynomial (a quad- 


ratic has 2 independent variables, a cubic 3, etc.). 


4.7 Multiplication by a Matrix 


MxM 
49 36 25 
4 1 16 


4 9 25 


58 


This is an unlikely operation in econometrics, but is essen- 


tial to the computations performed by the APL programs. 


4.8 Division by a Variable (Scaling and Reciprocal Regression) 


Division by a constant has already been covered under section 
4.5 above since, for example, dividing by 2 is equivalent to multi- 
plying by 0.5. Division by a matrix, like multiplication, is an 
unlikely operation in econometrics. However, division by a variable 
is an essential operation (as shall be demonstrated in Chapter VI). 
The most common example of division by a variable is 

+V 
1 0.528574029 " 0.25: .0144111144447 0,3333339833 “0.4 
This is simply 1+V, although it is not necessary to explicitly 
type the 1. This operation is generally referred to as scaling 
(or deflating) by a variable. 

This common example is often applied in the estimation of 
a curvilinear reciprocal relationship; that is, as the independent 
variable (x) increases, the dependent variable (y) decreases, but 


not in a linear fashion. Consider the relationship 


y = B, t Be H 
d 


59 


and note that as X; gets larger, H gets smaller and y approaches 
Bı (called an asymptote). Diagrammatically the relationship between 


y and X; looks like: 





B2 positive 


For small X; » a change in X; will result in a larger change 
in y than will the same chanqe in X; for large X; (see illus- 
tration). For positive B2 , the relationship will have a negative 
slope; for 82 negative, the relationship will have the same 


curvilinear properties, but its slope will be positive. (Note that 


60 


this coefficient relationship is the opposite of the same relation- 
ship in a linear regression.) B, should now be interpreted as 
an asymptote, rather than as an intercept; in the diagram Di 


is positive. 


4.9 Logarithms (Log Linear Regression) 


(a) Natural logarithm 


el 
0 1.945910149 1.386294361 2.197224577 4: 0986412289 2 3025850 
eu 
1.945910149 1.791759469 1.609437912 
0.6931471806 0 1.386294361 | 
0.6931471806 1.098612289 1.609437912 


(b) Base N logarithm 


10@V f 
O 0.84509804 0.6020599913 0.9542425094 0.4771212547 1 


28M 

2.807351 922 26584962501 2.321928095 
d 0 2 

1 1.584962501 2,921 928095 


A DOMAIN ERROR results from an attempt to take logs of zero or 
negative numbers. These operations are required to estimate relation- 


ships that are multiplicative in form, such as 


61 


= Ax. 
y e 
which can be linearised by taking logs of both sides so that 


] = + Bl : 
o y= a d 0g X; 


where a = log A ; or, where e denotes exponential base, 


B 


Xe 
y = Ae JJ 


which becomes 


] = + Bax, 
og y D Bu, ° 


These are called the double log (or log-log) and semi-log trans- 
formations, respectively, and are further examples of curvilinear 
regression.</ These transformations are easily extended to the 
multivariate case (where economists often refer to the first formu- 
lation above as the Cobb-Douglas function). Note that in estimating 
the first relationship it does not matter which base logs are used, 
provided all variables use the same base, except that different 
assumptions are involved regarding the error term. If u is 
introduced additively in the equation for log y , then el or NU 
(natural and base N , respectively) appears multiplicatively in the 
equation for y - and remember that the least squares assumptions 
are made on the error in the linearised (additive) form, not on the 


error in multiplicative form. The second relationship must use 


2/ A good concise treatment is given in Johnston, pp. 50-52. 


62 


natural logarithms (that is, log to the base e , often denoted 

In ). In each of these cases the dependent variable is log y so 
that all the statistical information (R? , etc.) refers to variation 
in log y , which is not the same as variation in y Ki This 

means that the SEE and therefore the CV are very difficult to 
interpret since they involve square roots of the logs of variables. 
It also means that the statistical information is not comparable 


with a linear equation on the same variables. Note, however, that 


the statistical information for the different semi-log form 


y= at 6; log X; 


is strictly comparable to the linear equation on the same variables 
since the dependent variable is the same (y). 


Diagrammatically these relationships have the following 


shapes: 
(1) Double log: logy= a+ d log X; 
B; positive B. negative 
y J y j 






SÉ Ch 





.<] 
Di" 





3/ This is discussed further in the next section. 


63 


R _A 
Bj = -1, y= SC (called 


Note that when 8; =l, y = Ax; and when 
J 


a rectangular hyperbola). 


(2) Semi-log 1: log y= a + Bn, 


J J 
+ Bj positive y P negative 
0 | / X 0 ? 
J WI 


Note that this is the familiar exponential growth pattern when the 
B.X, 
logs are natural logs, since y = Ae (CW 


(3) Semi-log 2: vs ot d log X; 


y sia ` i 
B; positive Y Bj negative 





As we can see from these diagrams, the logarithmic transformation 


provides a wide choice of curvilinear relationships in regression 


analysis. 


64 


4.10 Comments on Curvilinear Regression 


The previous four sections (4.6 to 4.9) have been concerned 
with data creation and manipulation that can be used to estimate 
curvilinear relationships. It is useful to note that the same 
basic shape may be available from more than one functional form. 

In addition, other independent variables may appear in the regres- 
sion, in which case the previous diagrams indicate the partial 
relationship between the y variable and the particular X; vari- 
able in question, holding all other independent variables fixed. 
Also, it is possible to combine the different curvilinear forms, 
but the underlying functional shapes become complex. For example, 


the logarithmic-reciprocal relationship 


s 1 
a y= ot Ë, 
has the following shape for negative 6; 





65 


with a point of inflection at X; = 8;/2 and an upper asymptote 
at e% (if the loqs are natural logs). 

Once again it is important to emphasize that when the depen- 
dent variable is in terms of logs all the statistical information 
about the equation is based on minimizing the variation in log y, 
which is quite different from the variation in y itself. This 
includes the mean of the dependent variable, R? , R° Ee, 

SSR, SEE and the CV. Moreover, the latter two statistics are 
especially difficult to interpret since they involve taking the 
square root of the SSR which is itself the square of a magnitude 
expressed in units of log y. 

One way of attempting to overcome this problem and thus be 
able to compare two specifications with y and log y as dependent 
variables is to recalculate the equation stevistics, Consider a 
specification estimated with log y as the dependent variable. The 
predicted values (loa y ) can be converted into original units 
(y) by taking the appropriate antilogs. A new vector of residuals 
can be calculated (see below) and the formulae presented in Chapter 
III recalculated using this new vector of residuals (that is, results 
(10), (11), (13), (14), (15), (16) and (17)). This information is 
then comparable to the information obtained from equations estimated 
on the same sample of observations but with y as the dependent 


variable. 


66 


However, it is important to remember that: 


(a) 


(b) 


The error term enters the original specification multi- 
plicatively, so that the errors should be calculated 
as ratios (not differences) of y to y ; that is, 


since 


where A = antilog o » then 


The theoretical error term (u) must be assumed to 

be randomly distributed with a mean of one (and aà 

finite variance) so that the transformed error (log u) 
which appears in the estimating equation is randomly 
distributed with a zero mean (and a finite variance). 
(It is at this stage in econometrics that the lognormal 
distribution becomes important.) As a consequence of 
these two conditions, the equation statistics calculated 


using g will be based on a vector of residuals with 


nonzero mean (note that if y = y, £ = 1) , which in 


turn creates new problems since e'e/n-k is no longer 


an estimate of the variance of residuals. Instead this 


67 


magnitude should be calculated as (e-e)'(e-e)/(n-k) 
where € is the mean of the estimated vector of resi- 


duals e . Note that € should be near one. 


For a complete comparison of alternative specifications, 
these calculations should be performed both ways’! ; that is, not 
only should the loq regression results be transformed into linear 
terms, but also the linear regression results should be transformed 
into log terms (by taking logs of the y's and following the pro- 
cedures outlined above). Since the results are always biased in 
favour of the specification which has been estimated, a two-way 
comparison of the SEE's (say) might provide a guide to appropriate 
action. For example, if the linear specification is preferred to 
the log when the log results are recalculated so as to be comparable 
to the linear, and is also preferred when the linear results are 
recalculated so as to be comparable to the log, then there is strong 
evidence that the linear is the preferred specification. More 
often than not, however, the evidence will be ambiguous. Under these 
conditions, there are no statistical procedures for reaching a 
conclusion (other than prediction based on a new set of data - see 
Section 8.1) and non-statistical theoretical considerations must 
be relied upon if a choice is to be made between the two specifica- 


tions. 


4/ See H. Theil, Principles of Econometrics, pp. 544-545. 


68 


To perform the required recalculation of the equation statis- 


tics in APL following an execution of the REGR program, type 
CONVERT N 


where W is the base to which logs were taken originally. For example, 


TF log, y was used as the dependent variable, type 
CONVERT 10 

whereas if In y was used, type 
CONVERT x1 


(x1 in APL produces the value of the natural base el, 


4.11 Time Trends 
The operation 
T<25 109125 


creates a [25x1] matrix (column vector) of the numbers 1 through 
25. By using addition of a constant (section 4.2 above) the. opera- 


tion 
TYR<19464+7 


creates a [25x1] matrix of the years 1947 through 1971. 


69 


Naturally these operations can cover any observations over any 
time period. The result may be used as an identification vector 
in a data matrix. In addition, it is often useful or necessary 
to find out if the variables in a time-series problem have a time 
trend. For example, consider the dependent variable. The most 


common trends are either 


(a) linear, that is y =o + pt 
or Bt 
(b) exponential, that is y = Ae 
so that Iny = InA + Bt 


where t is an [nx]] matrix of equidistant numbers (defined above) 
and In refers to natural log. (Of course, the estimated 8 will 
be different in (a) and (b).) To estimate either (a) or (b) the 
[nx] ] matrix for t must be constructed. Either of the above vari- 
ables would be a satisfactory measure of the variable t -- the 
only difference in the estimated coefficients would be in the inter- 


cept. To see this define 


tyr = 1946 + t 


as above, then 


a+ BU 


< 
1i 


a + B(tyr - 1946) 
=(a - 19468) + Btyr 
The slope coefficient (8) is the same, but the intercept in the 


regression using tyr has been reduced by 19468. Similarly, 


70 
calculations are possible for the exponential case and other defi- 
nitions of t . Note that the identification variable (discussed 
in Chapter II) can be used as a time trend. 

This analysis can be extended to cover cyclical variations 
by using the 'multiplication by a variable' operation (Section 
4.6), since it is possible to create the variables t2, t3, etc. 
The addition of each power can add another turning point to the 


relationship. For example, the quadratic relationship 
y = Bi + Bot + Bt? 


has one turning point; a cubic can have two turning points, etc. 
Using such variables, the coefficient 8, identifies the trend and 
the coefficients of higher powers of t identify the cycles in the 
data.2/ As with the relationship involving powers of X; (Section 
4.6), the connection between these non-linear relationships and 


(nested) hypothesis tests is discussed in Chapter V (Section 5.4). 


4.12 Lags 


In time-series analysis it is often required to lag a vari- 


able one or more periods so that values measured at the last quarter 


5/ This is the matrix P in Johnston, pp. 187-188. Note that the 
definition in (6-12) incorporates an error -- the third column 
should raise the numbers to the power of p, not multiply 
them by p. 


71 


or the last year appear as data for the present period. To define 


lagged variables in APL, statements such as the followinq suffice: 


-1oy 
06 3 e 0 e 3 

1bM 

S & 7 

1 "Re 

3 5 2 
Lei 

2 3 5 

7. 60 5 

° 4 j 


The operator d performs the rotation across the last dimension 
of M (i.e., changes the order of the columns), while the operator 
e performs the rotation across the first dimension of m (i.e., 
changes the order of the rows). 

Note that in most cases when a variable lagged t time periods 
is required in the analysis, the variable would be redefined in a 


statement such as the followinq: 
V+T¥ (-T) OV 


This statement lags the variable V by 7 time periods and drops the 
first T elements off the front of the resulting lagged vector. 

The regression should then be estimated over the new reduced number 
of observations, unless values for missing periods at the beginning 


of the sample are added to the new vector of lagged observations. 


72 


4.13 Dummy Variables (Seasonality) 


A dummy variable is, almost always, a variable created 


to contain only zeros and ones as data. The operation 
D+(12p1),12p0 


creates a variable D containing 12 zeros followed by 12 ones. 

This is the most convenient form and is applicable if there is 

an (additive) structural break in the relationship under consi- 
deration. Perhaps a more common form of a dummy variable occurs 
where zeros and ones are scattered throughout the vector as might 
be the case for dichotomous qualitative data. For example, a one 
might be allocated to an answer of yes and zero for no, or one for 
male and zero for female. Whichever form the dummy variable takes, 
the effect is to modify the intercept in the relationship only. 


For example, consider 
y =a + BX + yd 


where d is an [nx1] matrix (column vector) of zeros and ones 


and X is an [nx(k-1)] matrix of independent variables. When 


det, y =a + BX 


73 
whereas when 
qd Sils y = (a + y) + BX ° 


The slope (8) does not change. A dummy variable may include any 
combination of zeros and ones as suggested by a problem, but it is 
dangerous to use a dummv variable containinq only one or two ones 
(or zeros) because the computer will allocate a coefficient (y) 
with a view to minimizing the errors on those one or two observa- 
tions. Consequently, the estimated magnitude of y may have little 
relation to "the truth", since the dummy variable is not only 
explaining any structural change but perhaps also some of the re- 
maining unexplained variation. As more observations become avail- 
able in each of the two categories, the problem becomes less serious. 
One important use of dummy variables is for seasonal adjustment. 
Consider quarterly data as an example. A dummy variable can be cons- 
tructed for each quarter (denoted d, to d, , say) taking on the 


value 1 in that quarter and zero in all other quarters; that is, 


Quarter dı d; d3 d, 
1 1 0 0 0 
2 0 9 0 0 
3 0 0 $ 0 
4 0 0 0 1 
III 1 0 0 0 
2 0 3 0 0 


74 


Note that a horizontal summation of the four vectors yields an 

[nx]] matrix (column vector) of 1's -- which has also been defined 
as the intercept 'variable' (i.e., x; = 1). Consequently, the in- 
clusion of four seasonal dummies and an intercept violates assumption 


(iv), which rules out linear de endence. Bi Two choices are available: 
p 


1) retain the four seasonal dummies and delete the intercept Al vis: 


y = BX + yıdı + yod2 + y3d3 + yudy 


so that when d, = 1, for example, d2 = d, = d, = 0 and the 


intercept for the first quarter is yı , and so on; or 


2) retain the intercept ano delete any one of the seasonal dummies 


(say dy), vis: 
y =a + BX + y id, + yod2 + y3d3 


so that when dı = 1 and d2 = ds = O the intercept for the 

first quarter is (a+ yı), etc. Note that a becomes the inter- 
cept for the fourth quarter when d, = d2 =d} = 0 . Both methods, 
when computed correctly, will yield the same results (e.g., yy, in 
method 1 will equal a in method 2, etc.). In both cases it is 
assumed that seasonality affects the intercept only -- the slope 


is assumed to remain unaffected. 


6/ This problem is discussed further in Section 6.2. 


7/ See the later Section 4.15 for further elaboration of estimating 
equations without intercepts. 


75 


4.14 Interactive Dummy Variables 


If qualitative influences, structural break(s), or seasona- 
lity are believed to affect the slope(s) of a relationship, their 


effects can be tested by creating variables of the form 
dX = (d)x(X) 


using the vector or matrix multiplication (dependinq on the dimen- 
sions of d and X )8/ operation as described above. These are called 
interactive dummy variables and are very similar to the non-linear 
variables discussed previously (x; and t , etc.), except that in 
this case the multiplication is not performed on the variable itself, 
but on another variable. Again, it is possible to denote Z = d X 

and proceed as before. These variables are usually introduced with 
intercept dummies to avoid erroneous conclusions. For example, 
consider a relationship containing one dummy variable (d), and two 
independent variables (x2 and x3). To test the effects of, for 


example, a structural break on both intercept and slopes estimate 
y = Bi + BoX2 + BsXs + yid + y2(dx2) + y3(dx3). 
When d=0, 


y = Bi + B2X2 + B3X3 


8/ If d is not a single vector but a matrix, such as in the 
four seasonal dummies case, the notation should be D 


76 
whereas when d=1 , 


y = (Bı + yi) + (Bo + y2)X2 + (B3 + y3)X3 ° 


Now both slopes and intercepts are different. Note that exactly 

the same parameter estimates would be obtained if the two equa- 
tions were estimated separately on the relevant subsets of data 
rather than the one equation with interactive dummies on all of 

the data. St The development of interactive seasonal dummies -- a 
relatively uncommon econometric approach -- would proceed along 
identical lines using either of the two methods for seasonal dummies 
outlined above. It is probably uncommon because of the large 

number of parameters which would be introduced, all of which must 

be estimated. For example, with 2 independent variables and 4 
seasonal dummies there are 12 parameters, the same number as with 2 
independent variables, an intercept and 3 seasonal dummies. 10/ 
The method chosen depends on the purpose at hand and, in particular, 


what hypotheses are to be tested. This is covered in the following 


chapter. 


9/ Using all of the data is sometimes referred to as pooling 
the data. Note that with pooling only one Ri. SSR,.etc., is 
obtained, whereas when two separate regressions are estimated 
there is an R2, SSR, etc., for each regression. 
10/ That is, 4 on d's and 8 on dx's = 12; the intercept Bi, 
"Senses, 3:on d's: and 6 on dx's = 12. 


77 


4.15 Prior Restrictions 


To incorporate the prior restriction that 


simply eliminate the associated X; from the relationship. 

A special case involves Bu, the intercept term. Elimination 
of Bı (or x, ) forces the relationship through the origin, 
thereby incorporating the restriction that y = O when all 


xj = 0 (j = 2, ... , k). Since computer packages (usually) 


automatically insert the x, vector, it is necessary to ins- 
truct suppression of this insertion. To do this in SHARP APL, 
invoke the (non-default) state setting WOCONSTANT before exec- 
uting the REGR program (see p. 38). The user is warned however 


that the R2 does not have its usual interpretation under these 


conditions LU 


11/ A good discussion of this point is contained in Section 4.4 
of Theil's Principles of Econometrics (note especially foot- 
note 4 on p. . With no intercept term it is not 
appropriate to consider deviations around the means; thus 
variances are also inappropriate. Consequently, it does 
not really make sense to use a definition of R2 which refers 
to the percentage of the variance of the dependent variable 
explained. The alternative is to consider the percentage of 
the sum of squared residuals not explained, i.e. use y'y 
in the denominator rather than y'y-ny*. This is equivalent 
to the definition R? = 1-(e'e/y'y). Note that since ny? is 
a positive number, this measure of R? will always be greater 
than or equal to the variance definition of R*. For the 
sake of consistency it is always the variance definition of R? 
that is produced by the REGR program. To obtain the total sum 
of squares definition of R? following an execution of REGR when 
the state setting NOCONSTANT is in effect, type 

1-( (QAF)+.xAE)#+/Y*2 

where y denotes the dependent variable in the relationship. 


78 
To incorporate the prior restriction that a parameter 
take on a certain value, say B; » that is 


~ 


CH= RE for some j 
B) B; 0 dog 


incorporate the prior value and transfer the variable to the 
left-hand side of the relation (using the multiplication by a 
constant and subtraction of variables operations, sections 

4.5 and 4.3, respectively) and estimate the relationship with 

a new dependent variable. For example, if the prior restriction 


is Bo = 0.5 in the relationship 
y = Bi + B2X2 + B3X3 
then estimate the relationship 
(y - 0.5x2) = Bi + B3X3 ° 
Similarly, relationships involving lags resulting in 
first differences, such as 
Din = ya) = Bu + Balxy = ku Al 


impose the constraint that the coefficient on the lagged depen- 
dent variable (oul is one, as well as the constraint that 
the coefficient on X; is the same as that on X; lagged one 
observation. That is, the above example incorporates an addi- 


tional constraint of the form 


79 


Bam By jfk 


or, two parameters must assume the same value. In the above example, the 


coefficienton x, , is the exact negative of that on x; . 


Finally, to incorporate the prior restriction that the parameters 


in an equation are related through some linear combination such as 
By +B, = ° JF 


where c is some prespecified constant (or scalar), first solve the 
restriction for one of the parameters, then substitute the result into 
the original equation and rearrange the variables to facilitate estimation. 
Two reasonably common examples can be used to illustrate this procedure. 
First, consider the multivariate Cobb-Douglas specification (see page 61) 
above) | 

y= ege 
and consider the prior restriction (often referred to as constant returns 


to scale) 
B, + Bg = Lis 


By solving the restriction for Ba , the original specification can be 


written as 


80 


which can be estimated as 
log(y/x,) = o + B210g(x3/x3) 


where o = logA. An estimate of 83 is then obtained from the prior 


restriction as 
Bs = 1 - Bo. 


The second example is often related to seasonal adjustment (see Section 
4.13) where it can be specified that the seasonal coefficients sum to 


zero, or 
Yr hah ye Tyn Oas 


Using this prior restriction enables an easy identification of the 
seasonal peak and seasonal trough (largest positive and negative 
coefficients, respectively, for the variables defined on pp. 73-74 


12/ 


above). Solving for one of these coefficients (say y, ) and 


substituting enables the original specification to be written as 


y =a + BX + Yr(d, = dy) + YÜ (d, = dy) + Y (d, - d,) 


which can be estimated by defining the three new independent variables 


from the original four seasonal dummies. An estimate of the eliminated 


12/ It also has the advantage of purging the estimated intercept of any 
seasonal influence. 


81 


coefficient is again obtained from the prior restriction as 


Other prior restrictions which specify a linear relationship between 


the coefficients in an equation can be handled in a similar manner. 


To summarise, consider the general relationship involving three 


independent variables: 


y = Bi + B2X2 + B3X3 + BuXy 


Incorporating examples of the prior restrictions discussed above, 


B, = 0 
Bo + ps = 0 
B, = 1 


reduces the relationship to be estimated to 
(y - x.) = B2(xX2 - xs) . 


The incorporation of prior restrictions has reduced the number of 
parameters to be estimated from 4 to 1 in this case. Note that these 
restrictions should be tested using the hypothesis tests outlined in 


Chapter V. 


82 


CHAPTER V 
HYPOTHESIS TESTING 


5.1 Introduction 


This chapter briefly summarises most of the common 
applications of statistical formulae, together with the rele- 
vant null hypotheses (denoted Ho ) and alternative hypotheses 
(denoted H, ). These hypotheses refer to the true or under- 
lying population values of the relationship. Tests involving 
the estimated values are used to deduce conclusions regarding 
the true relationship. A high value for the relevant statistic 
usually results in a rejection of the null hypothesis in favour 
of the alternative. The formal tests are conducted using tables 
which can be found in the appendix to most texts, including 
Johnston. These tables always require that the user calculate 
the relevant degrees of freedom. The following summary includes 
all the tests mentioned in Chapters III and IV, together with 
some extensions. In all cases it is assumed that there are n 
observations, that there are k parameters to be estimated 
including the intercept, and that the conclusions are based on 


this one sample of observations. 


83 


5.2 Selection of a Confidence Level 


When testing a hypothesis, two incorrect conclusions are 


possible. These are called: 


Type I error -- reject H. when it is true 


Type II error -- accept H. when it is false 


Conventional confidence (or significance) levels, such as 0.05 

(5% or 95%) and 0.01 (1% or 99%) JI are used in hypothesis testing 
with Type I errors in mind since they reflect a desire to maintain 
a low probability of rejecting a null hypothesis when in fact it 

is true (e.g., a rejection of a shipment which meets specifications 
in a quality control situation or rejection of a satisfactory para- 
meter estimate). The higher the significance level (e.g., 99%) 

the lower this probability and vice versa. However, this indicates 
nothing concerning a Type II error (e.g., acceptance of a bad ship- 
ment or unsatisfactory parameter estimate). In general, for a 
fixed sample size, the probability of a Type II error can only be 
decreased by increasing the probability of a Type [ error and vice 
versa. Because of this trade-off, the chosen level of significance 
should be determined by considering the relative seriousness of the 
two types of errors, although convention suggests the use of either 


the 0.05 or 0.01 level. This is true for both t and F tests. 


1/ Either value tends to be used in applied econometrics -- little 
confusion results since it should generally be clear from the 
context. 


84 
5.3 t-Tests2/ 


There are four commonly used applications of the t-statistic. 
The first three are closely related to the prior restrictions 
discussed at the end of the previous chapter while both the first 
and the fourth were referred to in Chapter III. In summary, they 


are: 


(i) The most common application is testing whether 
an estimated coefficient (8;) is significantly 


different from zero; that is, Ho: Ba = 0..- Iñ 


J 
this case 
S = Ü 8. 
eegene! or z 
V A j 
var GU 


with (n-k) degrees of freedom. 


(ii) An extension is testing whether an estimated co- 


efficient is significantly different from some 





i 5 E > B; = B; 
— s number, say B} then H, B; Bj or 
Bj - B; = 0 , and 


2/ See the definition in Johnston, p. 26. Recall that these tests 
are based on the assumption that the underlying errors are nor- 
mally distributed. 


85 
with (n-k) degrees of freedom. Only S; appears in 
the denominator since the standard error on a (fixed) 


prescribed number is zero. 


(iii) A further extension involves testing the equality, 
sum or difference of two estimated coefficients in 
a regression. For example, consider Ba and Ga, 
Then, Ho: Bo = Bs or B2 - Bs = O 
or Ho: Bo = -B3 or B2 + B3 =O . 


Now the relevant t-statistics (again with (n-k) degrees 


of freedom) are 


AN A 
Bo + B3 


t= y Up oD oor: 
d var (B2) + var (B3) + 2covar (B283) 


where the + alternative refers to each of the above H, S 
and the values in the denominator are obtained from the 
variance-covariance matrix.3/ This is the main use of the 
covariances (or off-diagonal terms) in this matrix. 

(Hint: Do not confuse it with the correlation matrix - 


see Chapter III for the definition.) 


3/ See result (9), Section 3.1 and the subsequent discussion, or Johnston, 
p. 179. 


86 


(iv) A different application involves testing the significance 
of a simple correlation coefficient (r). The null hypo- 
thesis postulates a zero simple correlation and can be 


expressed as Ho: p = 0. The relevant t-statistic is 


rvn-2 
t= — p 
l-r? 


ay 


distributed with (n-2) degrees of freedom. ~ 


These tests may be conducted as one-tail or two-tail tests. 
This choice depends on the alternative hypothesis (Hı). If Hı 


involves a ‘not equal toi sign, that is 


(i) Hı: Bj # 0 
(Ti) Hh: By — Be 0 
(iii) Hi: Bo +* Bs z O 
(iv) Hy: p # 0 


then a two-tail test is required. However, if H, specifies that 


a parameter be positive (or negative), that is 


4/ Johnston, p. 37. 


87 


(i) Bug Bu: 0 (or < 0) 
(ii) Hh: Bj = 8; > 0 for <0) 
(Apt). Rang Be Bes for <.0) 
(iv) Hi: oz 0 (or < 0) 


then a one-tail test is required. Since hypotheses about the true 
underlying population come from theoretical specifications, the 

user must use theory to specify the appropriate alternative hypo- 
thesis Hı . If theory is non-existent or inadequate in this regard, 
then two-tail tests are usually conducted by default. 


Therefore, before conducting a t-test the analyst must 


(a) select a confidence level -- say 95% or 99% 


(b) decide on H, -- that is, a one or two-tail test. 


A critical t value can then be ascertained from the appropriate 
table, remembering that for small samples (say n less than 60) 

the appropriate degrees of freedom must be considered. The following 
table of selected critical values gives the user some idea of the 


magnitudes involved :2/ 












degrees of freedom (n-k) = œ (n-k)=60 
confidence level 957 99% 









one-tail 1.725 2.528 


two-tail 2.086 2.845 


95% 99% 
1.645 | 2.326 
1.960 | 2.576 


5/ The appropriate table in Johnston appears on p. 426 and the 
confidence levels refer to one-tail tests. 


88 


Since the most common t-tests tend to be two-tail with 95% confi- 
dence levels,the critical t-value of 2 is often used for a rough 
guide -- but remember it is rough! In practical applications of 
case (iv), the null hypothesis is usually quite easily réjected. 

One final note -- a t-value can be either positive or negative 
because the numerator can be either positive or negative, so the 
critical values must apply to both positive and negative t-values. 
If the modulus of the estimated t-value5/ is larger than the critical 
value, H is rejected in favour of Hı ; if it is smaller, H 


o 
is not rejected. In the case where a one-tail test is called for, 


0 


for example, 
Hi: 8, > O 


but the estimated Bs turns out to be negative (!), clearly H, 
can never be accepted even if the modulus of the associated t-value 


(say t, = -10) is greater than the critical value. 


5.4 Two Uses of t-Tests 


This section presents two uses of t-tests, both mentioned 
in Chapter IV, as illustrative of the number of different uses 
to which t-tests can be applied. A brief discussion of nested 
hypothesis testing is presented, as well as the choice of the equa- 
tion specification to facilitate the testing of desired null 


hypotheses. 


6/ The modulus, or absolute value, simply drops the negative sign 
if it exists. 


` 89 


Nested hypothesis testing is relevant whenever polynomial- 
type relationships are being estimated since the choice of the 
degree of the polynomial is usually quite arbitrary. This problem 
is encountered in both polynomial regression (see Section 4.6) and 
in the analysis of cyclical variations (see Section 4.11). In both 
cases, the highest degree of polynomial suggested by theory and/or 
data availability should probably be chosen and the equation esti- 
mated including all lower powers of the variable. For example, if 
a cubic is chosen, the equation should be estimated including X; 
and xi as well as xj . The coefficient on the variable with 
the highest power should then be subjected to a t-test. If it is 
not significant the associated variable should be eliminated and SÉ 
equation re-estimated. The procedure should then be repeated and 
continued until a coefficient is found to be significant. Then the 
associated variable should be retained in the specification and the 
procedure continued with variables of lower powers until only signi- 
ficant variables remain in the specification. 

There can be difficulties with this approach because, although 
the variables are not linear combinations of each other (x5 is not 
a linear combination of two X;'s ), they may be approximately linear 
over the range being used, and problems can arise. (These are dis- 
cussed in Section 6.2.) Finally, recall that there may be some prior 


sign expectations on the estimated coefficients, in which case the 


90 


relevant t-tests are one-tail tests (see the previous section). 
The choice of the equation specification to facilitate the 
testing of desired null hypotheses particularly arises when dummy 
and interactive dummy variables are being used in an equation (see 
Sections 4.13 and 4.14). Consider the case of seasonal adjustment 


where either 


< 
"J 


a + BX + yidi + yod2 + y3d3 


or 


BX + ıdı + S2d2 + S3d3 + Sydy 


< 
" 


can be used to estimate the coefficients of the equation. The 


correspondence between the coefficients can be established as: 


a= Ó intercept for 4th quarter 
a+ Vy = ën intercept for lst quarter 
a+ Y2 = ës intercept for 2nd quarter 
a+ Ys = 63 intercept for 3rd quarter . 


Consequently the t-statistics on both a and 6, provide 
a direct test of whether or not the intercept for the fourth quarter 
is significant. The same is true for all the remaining 6 coeffi- 
cients. In turn, the t-statistics on the y coefficients provide 


a test of whether the remaining quarters are significantly different 


91 


from the fourth quarter. To see this note that the relevant null 


hypothesis is really 


which is the null hypothesis for the associated t-test. This means 
that if the desired null hypothesis is "Is the second quarter signi- 
ficantly different from the fourth quarter?", then the t-statistic 
on the coefficient yz above gives a direct test of this hypothesis. 
But if the desired null hypothesis is "Are the first and third 
quarters significantly different?" the appropriate test is not so 


easy. In the first equation above it involves 


Ht @ * yr = OF fs 


which implies 


or alternatively 


Ho: Yi -y3 =O. 


In the second equation it involves 


Ho: 64 = 63 


92 
or 


Ho: 6; = Bas Ó , 


Both of these null hypotheses can be tested using the formula desc- 
ribed in t-test (iii) in the previous section - but this involves 

a calculation by the analyst employing the variance-covariance 
matrix. By respecifying the equation it is possible to get the 
computer to do this calculation automatically. This can be achieved 
by using the first formulation, which includes an intercept and 
eliminating either d, or ds, , and including all three remaining 


seasonal dummies. For example, using 
y =a + BX + dodo + a3d3 + aydy 
involves the null hypothesis 
Ho: a=a + as 
or 


Hr a3 = 0 


so the t-statistic on a3 will be the desired test statistic and 
will be printed automatically as part of the computer output from 
the regression. Similar respecifications are possible for all 


equations involving dummy and interactive dummy variables and should 


93 


be considered whenever a desired null hypothesis potentially in- 
volves the comparison of two estimated coefficients in the same 


equation. 


5.5 F-Tests 


An F-statistic is a ratio of two sums of squares, each divided 
by the appropriate degrees of freedom. Consequently, it is speci- 
fied by two degrees of freedom, with those of the numerator tradition- 
ally appearing before those of the denominator. Three such tests 
can be summarised here, although further applications will be referred 


to in the following chapter. 


A3? The first F-test, mentioned in Chapter III, tests the overall 


significance of the regression; that is, it tests 


Ho: Be m Bs.m gesit SH, 


Recall that the F-statistic is defined as 


F[k-1, n-k] = R2/(k-1) 
(T-R7)/(n-k) 
and is distributed with (k-1) and (n-k) degrees of freedom. _/ This 
is the F-statistic (usually) provided by computer printouts. Note 
that this test does not include the intercept term; it is a test on 
slopes only. Note also the relationship between F and R? , so 


that given a value of R* it is easy to compute the F-statistic. 


7/ See Section 3.1, result (16) or Johnston, pp. 142-143. 


94 


An F-test for a simple regression (one independent variable 
and an intercept) involves two parameters, so that k-1 = 1 and the 
null hypothesis is H. Bo = 0. This is equivalent to a t-test 
on the same null hypothesis. In fact, F =t? . This equivalence 
holds whenever the degrees of freedom for the numerator in an F-sta- 


tistic equal one. 


(ii) This test can be extended to cover any subgroups of several 
coefficients by considering the ratio of the additional 
(incremental) explained sum of squares to the residual sum 

of squares. This method is called an analysis of variance. In 
this case the ordering of the independent variables may be 
important, since the explained sum of squares is an increment, 
given that the previous independent variables have already 

been included in the regression equation. Note again that a 
test on a subgroup of one (i.e., a single coefficient, say 

Bk ) is equivalent to the t-test (i) presented above with. (n-k) 


degrees of freedom. BI 


8/ See Section 5.3 or Johnston, p. 144. 


95 


(iii) These concepts can be extended to the analysis of covariance. 
Here the objective is to test whether groups of data were 
derived from the same underlying population. If all the 
data were combined, this is referred to as pooling. Three 
different F-tests are conducted, based on three sets of 
regressions - one on the pooled data, one on the pooled data 
with intercept dummies, and a third set of regressions which 


includes one regression for each class of data. They test for 


a) homogeneity of slopes, then 


b) homogeneity of intercepts (given homogeneous 
slopes), and finally 


c) overall homogeneity (of both slopes and inter- 
cepts) 


in that order. A number of different levels of generality 


are possible in these tests: 


(1) 2 classes of equal size 
(2) 2 classes of unequal size 
(3) p classes of equal size 


(4) p classes of unequal size . 


The relevant F-tests for each of these cases are presented below. 
Case (1) can be readily incorporated into case (2), so con- 


sider 2 classes of size n and m , respectively (and assume 


na m > k). Denote 


9/ This notation has been chosen since experience has ‘shown that 
— the notation in Johnston, pp. 196-199, seems to confuse the 
reader. For correspondence, note that S, = A-D, S2 = D. 
S, = D-(B+C), S, = B+C and therefore S, + S; = A-(B+C). 


96 


A = sum of squared residuals from pooled data 
(nm observations), 


B = sum of squared residuals from n observa- 
tions, 


C = sum of squared residuals from m observa- 
tions, 


D = sum of squared residuals from pooled data 
with a single (intercept) dummy variable. 
These SSR are (usually) part of the regression output (see Section 


3.2). Then the appropriate F-tests are 


(a) slopes: F[k-1, n+m-2k] 


AH, 
B+C)/(n+m-2k) 
(b) intercepts: FL1, ntm-k-1] = A-D/1 

RS D/(n+m-k-T) (20) 


A-(B+C)/k 
B+C)/(m+tn-2k 


(c) overall: FLK, mtn-2k] 


Note that (b) is equivalent to a t-test in the case of 2 classes. 


Test (c) is often called the Chow-Test.22/ It can be approximated 
for the case when m< k , that is, when there are insufficient 


observations to estimate the relationship determining C , pell 


Fim, ñ=Ki = DEAT i 


10/ After G.C. Chow, "Tests of Equality Between Sets of Coefficients 
in Two Linear Regressions", Econometrica, Vol. 28 (1960), 
pp. 591-605. 


—_ 
— 
= 


See Johnston, p. 207. 


97 


For cases (3) and (4), re-define D with (p-1) intercept 


dummies and extend B and C to p regressions. Define Ce B+C+ ... 


as the addition of the sum of squared residuals and apply formulae 
(a) through (c), replacing (B+C) by > and replacing the lower case 
letters by the appropriate degrees of freedom. See Johnston, pp. 
198-199, for case (3). Note that the appropriate degrees of freedom 
in these formulae are obtained by adding or subtracting the degrees 
of freedom of the corresponding SSR in the same way as the SSR them- 
selves were added and subtracted. For example, the degrees of 


freedom for the numerator in (a) can be calculated as follows: 


D has (m+n)-(k+1) degrees of freedom 
B has (n-k) degrees of freedom 


C has (m-k) degrees of freedom 
so that 
D-(B+C) has (m+n)-(k+1)-(n-k+m-k) 
or 
m+n-k-1-n+k-m+k = k-1 degrees of freedom. 


All other degrees of freedom calculations follow the same pattern. 
The F-test is quite a stringent statistical test and it is 
not uncommon to reject H . Values for F in the thousands are 


not unusual. In case (i) this occurs when R? is very large and 


98 


the relevant alternative hypothesis (H,) is that the overall 
relationship is significant. This can arise if at least one of the 
slope coefficients is significantly different from zero, so that 

one significant t;-value will automatically result in a signifi- 
cant F-value. However, since it is a joint test on a number of slope 
parameters, a significant F-value on the overall relationship can 
result even if none of the individual parameters is significant 

(by a t-test), although the resulting regression is usually considered 
unsatisfactory. For these reasons this is not in practice a very 


useful test statistic in case (i). 


The output required for the analysis of variance (ii) is 
often not available as part of a regression output. It is avail- 


able in the REGR program. To use this recall thatl2/ 


TSS = y'y - ny? 
RSS (or SSR) = e'e 
and ESS = TSS - RSS 


where ESS can be broken into k-1 components 


k 
ESS. = = ESS; + 
Zë 9 


ESS. presents the additional (or incremental) sum of squares ex- 
plained by independent variable Xj given that variables x, 


through KA are already included in the regression. Consequently 


12/ See Section 3.1. 


99 


the order of the independent variables becomes important - here X; 
must be the last independent variable. The significance of this 
variable, or this and a subsequent qroup of variables, can be tested. 
Suppose the variables x, through x. are included and interest 

is focused on the last (k-r-1) variables. The relevant null hypo- 


thesis is 


and the relevant F-statistic is 


k 
5 SSR;/(k-r) 


F[k-r, n-k] = E | sh Mu e á 
SSR/ (n-k) 
The denominator is S? , the square of the standard error of the 
estimate. If r+l = k , the object is simply to test the significance 


of the last variable, and F reduces to 


SSR). 


S2 





F[1, n-k] = 


which is equivalent to the t-statistic on that coefficient squared 
(ty). These F-statistics must be calculated by the analyst. 

The information needed to conauct the analysis of covariance 
(iii) involves the SSR from p+2 regressions where there are p 
classes of data. First, there are the p individual regressions 
on each class of data. Secondly, there is one regression of the 
same form on all the data pooled together. Finally, there is one 
regression of the same form but including (p-1) (intercept) dummy 


variables. The null hypotheses being tested are 


100 


(a) that all the slope coefficients of the first 
p regressions are the same, 


(b) that, given that the slopes are the same 
across the p regressions, the intercepts 
of the first p regressions are the same, 
and 


(c) that both the slopes and intercepts of the 
first p regressions are the same. 


Note that (c) can be conducted without (a) or (b); (b) assumes 


that (a) has been conducted and the null hypothesis accepted, and a 


rejection of the null hypothesis for either (a) or (b) should result 


in a rejection of the null hypothesis for (c). The general F-statis- 
13/ 


tic for overall homogeneity— can be expressed as 


SSRP - FSSR/df, 
F[df., df2] = ——ə—ə— əə. 
ESSR/df 2 


R ; 1 
where SSRP is the SSR from the regression involving pooled data 2 


ESSR is the summation over p classes of the 
individual SSR's, 


df; is the degrees of freedom for the numera- 
tor which is the number of constraints 
imposed = k(p-1), 15/ 


and df, is the degrees of freedom for the denomina- 
tor which is the sum of the degrees of 
freedom for each of the p individual 
regressions, 


13/ The other F-statistics can be defined analogously followin 
the formulae given in equations (20) above. This is case iv). 


14/ Denoted A above. 


15/ In general, the degrees of freedom for the numerator are the 
number of equal signs in Ho, or the number of constraints imposed. 
For example, (Bı in class i} = (Bı in class 2) is one restriction 
in the pooled regression, etc. 


101 


Again, these F-statistics must be calculated by the analyst. 

Note that it is also possible to combine the analysis of 
variance and the analysis of covariance to test for slope homo- 
geneity of a subset of coefficients. To do this, interactive 
dummy variables must be introduced into the pooled regressions in 
order to allow the unrestricted slopes to vary over the classes. 
The SSR from these regressions are then used in the F formulae 
(that is, in place of A and D) and the corresponding degrees 
of freedom calculated accordingly. 

The critical values for the F-test are usually presented in 
tables with the degrees of freedom of the numerator on one side and 
the degrees of freedom for the denominator on the other. Different 
tables are needed for different confidence levels. 

Further applications of the F-test will be encountered in 


the following chapter. In particular, the 


(a) Farrar-Glauber test for multicollinearity 


(b) Goldfeld-Quandt test for heteroscedasticity 


involve F-statistics which will be discussed in that chapter. The 


Durbin-Watson (d) test will also be discussed in Chapter VI. 


102 


CHAPTER VI 
REGRESSION PROBLEMS IN PRACTICE 


6.1 Introduction 


This chapter is concerned with violations of the assumptions 
(i) through (iv) presented at the beginning of Chapter III. These 
assumptions can be violated in two possible ways: 

(a) theoretically -- that is, the model (or equa- 


tlon) being estimated automatically leads to 
a violation of the assumptions; 


(b) statistically -- that is, the particular data 

being used may have characteristics that lead 

to an empirical violation of the assumptions. 

This is the more common of the two. 
Violation of these assumptions leads to problems because the ordinary 
least squares (OLS) parameter estimates are usually no longer BLUE] -- 
that is, they may no longer be "best" (minimum variance) or unbiased. 
If they are not the minimum variance estimates but are unbiased, they 
are referred to as inefficient estimates, and the method of generalized 
least squares (säi can be used to efficiently estimate the parameters 
of the model. If they are biased, there is little that the analyst can 
do except to attempt to eliminate the cause of the bias or to ascertain 


the possible direction of the bias. 


1/ Best, linear, unbiased estimates; see Chapter III, especially 
Section 3.1. 


2/ See Chapter VII for a further discussion. 


103 


The problems that arise because of the different violations 


of the assumptions can be categorized under five headings: 


1. Multicollinearity 
2. Heteroscedasticity 
3. Autocorrelation 
4. Misspecification 


5. Errors in Variables 


The remainder of this chapter will deal with each of these problems, 
outlining the nature of the problem, its detection and possible solu- 


tions. Chapter VII is devoted to a further discussion of GLS. 


6.2 Multicollinearity 


This problem arises when any independent variable is a linear 
combination of any other independent variable(s). An example occurs 
when four seasonal dummies (d, to d,) are introduced into a regres- 
sion as independent variables, in which case the relationship cannot 


be estimated with an intercept since 


d, + do + ds + d, = x, 
3/ 


both of which are n-element column vectors of ones.— This is a 


particular type of linear combination, since it involves simple 


3/ This problem was mentioned in Section 4.13. 


104 


addition with no constant and no weighting (or coefficients) on the 
different independent variables. However, the problem would also 


arise if, for example, 
Ap = 15 - 0.2x; 


or 


X3 -0.05 + 32x. = 0.9x, + 82.1x5 


since both are linear combinations of some other independent vari- 
ables. Note that the problem does not automatically arise if, for 
example, x, and x% are introduced into the same regression as 
separate independent variables, since this is not a linear combina- 
4/ 


tion.— 


The assumption in question in these cases is assumption: 
(iv) X has rank ken, 


It should be apparent that either x, or x3 in the above examples 
does not contain any additional information since the two simply 
(linearly) combine information which already exists in the other 
independent variables. The reason for the assumption in the first 
place is that the basic results (8) and (9) in Chapter III require 


the inversion of the [kxk] matrix X'X. If X is not of rank k, 


4/ See Sections 4.6, 4.11 and 5.4 for discussions of these types 
of variables. 


105 


this inversion is not possible (and the computer returns a pOMAIN 
ERROR). 

The real problem arises not when the linear dependence is 
exact and the computer returns an error message, but when the linear 
dependence is strong but not exact (that is, when the above examples 
hold with some error) and the computer returns some results. It is 
this problem to which the term multicollinearity is usually applied. 
The task of the analyst is then to determine whether or not the 


estimated results are meaningful. This raises two important issues: 


a) the detection of multicollinearity, and 


b) the existence of harmful multicollinearity. 


In other words, even if multicollinearity (in the statistical sense 
described above) exists, it does not necessarily follow that correc- 
tive action is called for -- the existing parameter estimates may 
be reasonably accurate. 

Multicollinearity may be detected by an examination of the 
standard errors of the estimated parameters (the S;'s ). Under 
these conditions the S. tend to be large. Consequently, the t; 
tend to be small, leading to the conclusion that a coefficient is 
not significantly different from zero. The danger then is that. the 
analyst may incorrectly discard the associated X; variable from the 


eouation because "ts coefficient is not significantly different from 


106 


zero by the standard t-test. Moreover, because the S; are large, 
the specific parameter estimates obtained may fall within a very 
large range and may therefore be meaningless. Finally, as is noted 
by Johnston,2/ these estimates become very sensitive to particular 
sets of sample data, and the inclusion of additional observation(s) 
can produce dramatic shifts in some of the estimates. Remember, 
multicollinearity problems can only arise if there are at least two 
independent variables (in addition to the intercept, that is k 2 ai 
and can involve any linear combination of the independent variables. 
Moreover, the large standard errors on the parameter estimates (or 
low t's ) should provide an adequate warning whenever the problem 
exists. A common indication of the problem is an equation with a 
relatively high R2 but with no (or few) significant coefficients. 
There is no statistical test for multicollinearity -- if it 
is exact, the problem cannot be solved; if it is approximate, the 
problem is to determine whether or not it is 'harmful'. This is a 
subjective evaluation usually based on the Du? ; however, two 


pieces of information can be useful: 


(1) the correlation matrix displays the simple correlation 
between any two variables and, in particular, between any two indepen- 


dent variables. This can be used to test simple relationships of 


5/ See Johnston, p. 160. 


6/ Although, of course, the intercept may be restricted to be 
zero; see Section 4.15. 


107 


the form X; = Bi + 8.x, (j # 2), where X; and x, are any two 


$, 


independent variables. In this case rig = R? for the relation 


in question. As a rough rule, >0.9 suggests that the analyst 


Insel 
should be prepared for harmful multicollinearity -- but it must be 
emphasized that this is only a very rough guide. It might seem 
that the associated t-test_/ would be useful for this purpose. 
However, a correlation coefficient which is significantly different 


from zero (by no means an unusual occurrence) does not necessarily 


produce harmful multicollinearity. 


(2) the Farrar-Glauber equat zone consider the more general 


relationship between each X; and all (or a subset) of the remaining 
(k-1) independent variables. Using the coefficient of multiple 
correlation (R3) between each Xj and the remaining (k-1) independent 
variables, the (k-1) F-statistics 
| R./(k-2) 

FJ[k-2, n-k+1] = ——— 
(1-Rj)/(n-k+]) 
with (k-2) and (n-k+1) degrees of freedom, determine the overall significance 
of linear relationships among the x's . These are the F-statistics for the 


significance of the overall regression that appear in the computer printout. 


There will be (k-1) such regressions and F values to consider, one 


7/ See Chapter V, Section 5.3, t-test (iv). 

8/ See D.E. Farrar and R.R. Glauber, "Multicollinearity in Regression 
Analysis: The Problem Revisited", The Review of Economics and 
Statistics, Vol. 49 (1967), pp. 92-107. 


108 


for each independent variable. An inspection of the S will show 
which independent variables are best explained by the remaining 
independent variables; the highest FJ indicate which Xj's might 

be causing the problem. These equations can be a useful guide to 
discovering the cause of the problem, as well as of some assistance 

in deciding whether or not the existing multicollinearity is harmful ,2/ 
but they are of no use with respect to providing a solution. 

The remedy for multicollinearity is the acquisition of further 
information, if possible. For example, one might obtain a prior (or 
extraneous) estimate (denoted 8; ) of the coefficient of one of the 
explanatory variables causing the problem (usually from other research 
on the same subject) and re-estimate the relation using (y-8jx;) 
as the dependent variable; or, obtain another measure (that is, 
different data) of the independent variables involved; or perhaps 
re-examine the basis of the original specification to see if some 
alternative specification of the relationship may be equally satisfac- 
tory. One common method used with regard to the latter acquisition 
of more information is to normalize (that is, divide) by one of the 
problem independent variables. However, this can be a dangerous 
practice, since often the analyst inadvertently changes the specifi- 


cation. As an illustration consider the relationship 


y = Bi + B2X2 + B3X3 


9/ Very high FY values (e.g., over 100) will suggest harmful 
multicollinearity. 


109 


and suppose that harmful multicollinearity exists between x2 and 


X, . Normalizing by (say) x3 produces 


<n Bee bes 


Note that 83 is the 'new' intercept in this regression; more 


importantly, note that this is NOT the same as 


i} vos oo 8). 
which is the relationship often mistakenly estimated. Estimation 
of this latter relationship may involve a specification error, which 
is an even more serious problem. 12/ 

Finally, it is important to emphasize that it should first be 
ascertained that multicollinearity is in fact a problem before a 
remedy is sought. That is, it should be checked that the problem 
exists and that it is harmful in the sense that important variables 
have meaningless or insignificant coefficients. In addition, it 
should be determined whether the problem has important consequences 
with respect to the particular analysis contemplated. Often-îf the ana- 
lyst is interested’ in‘ short>term: forecasting, an'estimated rehationship 
with sensible, even though insignificant (low t-values) coefficients 
might produce very satisfactory results. Also, under certain condi- 


tions, multicollinearity may produce offsetting errors in two (or 


10/ See Section 6.5 in this chapter. Notice also that the error 
structure has been changed. 


110 


more) parameter estimates so that their total is estimated quite 
accurate1y.11/ If the total is all that is of interest then multi- 
collinearity may not be a serious problem. But if it is a probiem, 
the only remedy is the collection of additional information; if 
this is not possible it becomes a problem that the analyst must 


‘learn to live with'. 


6.3 Heteroscedasticity 


This problem arises when the variance of the disturbance 
term (estimated in the SEE) is not constant over all the observa- 
tions. It is a common problem since large observations are often 
expected to be associated with larger variances. For example, in 
time-series data where variables often grow with the passage of 
time, or especially in cross-section data where observations (e.g., 
families, firms, or departments) are arranged in order of increasing 
size (however measured), the analyst miqht expect greater absolute 
variation in the observations at the higher end of the series than 
at the lower end. Note that ordering is not essential (it only 
makes understanding a little easier) and that it is absolute differ- 
ences and not relative (to size) differences that are important. 

The assumption under question in this case is assumption: 


(47) E(uu') = oi, 


11/ Johnston, pp. 161-162 gives an example. Another example is 
determining whether a Cobb-Douglas production function has 
constant returns to scale, in which case the null hypothesis 
is only concerned with the sum of the relevant parameter esti- 
mates {see page 79 above). 


111 


which implies that the SEE is constant over the entire relationship. The 
problem now lies in the errors and not in the Xs: variables. Since u is 
[nx1], uu' is an [nxn] matrix. Therefore, suppose assumption (ii) is 


replaced by a more general assumption LÉI 


E(uu') = oQ 
where Q is an [nxn] symmetric, positive definite matrix. For the 
moment assume that it has non-constant diagonal elements. This is what is 
meant by heteroscedasticity. If the other three assumptions are retained, 
it can be shown Ai that the most efficient parameter estimates are obtained 


from 


6 = (oc Ju") vc, (21) 


with variance-covariance matrix 
A = ° ' -] -1 
var B = o2(X'Q X) a (22) 


In order to obtain these estimates, Q must be a prespecified known matrix 


and oi must be replaced by an estimate, S? calculated as 
' EN 

E Be ELE ` (23) 
n-k 


oi 
Since it is virtually impossible to know the underlying variances of the 
true errors in the true relationship, the diagonal elements of Q are 
usually estimated in some way (see below). These are called the generalized 
least squares (GLS) estimators and will be discussed further in Chapter VII. 


12/ See Johnston, p. 208, equation (7-1). These results also will be used 
in the next section on Autocorrelation and subsequently in Chapter VII. 


13/ See Johnston, p. 210. Compare these with results (8), (9) and (10) 
in Chapter III. 


112 


Under these conditions the ordinary least squares estimates 
(result (9)) are LUE but not BLUE; that is, they are not the best 
or minimum variance estimates. Their use could result in a loss 
of efficiency. A comparison of approaches to the problem when hetero- 


scedasticity exists involves three variance-covariance matrices: 
(i) Estimated OLS 

var B= 02(x'x)7! (9) 
This is produced by the computer if OLS instead of GLS is used. 
(ii) True OLs14/ 

var B= 02(X'X)7! moi (X'x)"! 
This is what should be estimated if OLS instead of GLS is used. 
(114) GES 

var e = aire" vu (22) 


This is what should, in fact, be estimated since GLS is appropriate. 
These form the basis for two theorems: 
(a) The estimated OLS variances are less than the 


true OLS variances (since Q is a positive 
definite matrix), 15/ That is, var(i) < var(ii). 


14/ See Johnston, p. 215, result (7-31). 


15/ Johnston does not illustrate this case. See Goldberger, Econo- 
metric Theory, p. 240. 


113 


(b) Since GLS estimates are BLUE they are minimum 
variance. Thus, GLS variances are less than the 
true OLS variances. 16/ That is, var(iii) < var (ii). 


Unfortunately, there is no theorem linking (iii) with (i), i.e. 


linking the desired GLS variances with those variances obtained 
by inadvertently applying OLS on the computer. From the two theo- 
rems we can see that any relationship is possible (var(i) 2 var(iii)). 
Consequently it is not possible to deduce anything from the OLS 
a ere of S; , and particularly the associated t; values, if 
GLS should have been used. The application of GLS is the only solu- 
tion under these conditions. 

There is no one satisfactory test for the assumption of 
homoscedasticity. Three tests will be discussed here. It is probably 
a good idea for the analyst to conduct more than one, if possible, 


to provide greater confidence in the conclusion. 


(1) The Goldfeld-Quandt F-test assumes a particular type of 
heteroscedasticity, testing the null hypothesis of homoscedasticity 


against the alternative hypothesis of 
Rye Eta") = SW 


J 


that is, the variance increases with the square of any one of the 


16/ See Johnston, p. 216. 


114 


independent variables. Ordering the observations in accordance 

with the size of the chosen X; , omittinq the middle 25% (denoted 

c ) of the observations ,17/ applying OLS separately to the first and 
last (n-c)/2 observations (assuming [(n-c)/2]>k), and taking the 
ratio of the SSR of the regression on the larger X; values to the 
SSR of the regression on the smaller X; values, results in the 


F-statistic 


SJ. Lu ee T SSR(larger) 
YA ët A SeR(smalTer) 


with [(n-c-2k)/2, (n-c-2k)/2] degrees of freedom. Note that both 
numerator and denominator have the same degrees of freedom so it is 
not necessary that they appear in the calculation on the right-hand 
side. A low F-value leads to the acceptance of homoscedasticity, 
while a high F-value leads to the acceptance of the alternative 


hypothesis of heteroscedasticity of the particular form specified. 





It is useful to illustrate this test diagrammatically. In 


Figure 1, the relevant range 





Figure 1 


17/ This is only an approximate rule. For exam = 0 
17/ Th e ple, set c=8 when n=30, 
c=16 when n=60, etc. See Johnston, p. 219. Note that c here 


be an odd number if n is an odd number. H Theil inci 
I ° ° » Princip] 
Econometrics, p. 198 argues that c should equal 1. es of 


115 


of observations for X; is divided into three groups (1, 2, and 
3). The dotted lines indicate the type of heteroscedasticity being 
tested. Separate regressions are then estimated on the observations 
in groups 1 and 3. The F-statistic is the ratio of the SSR from 
group 3 to the SSR from group 1. Since the degrees of freedom are 
the same in each group (see above), this is equivalent to comparing 
the SEE's in each regression, which is equivalent to comparing the 
distance AB with the distance CD in Figure 1. Clearly, if AB is 
much larger than CD, then it is likely that there is heteroscedas- 
ticity of the type indicated by the dotted line; if AB and CD are 
of approximately equal size then the assumption of homoscedasticity 
should not be rejected. However, a problem can arise because sepa- 
rate regressions are estimated on the two groups; this is illustrated 
in Figure 2. The F-test result would be exactly the same in both 
figures (since the SEE's and hence the SSR's are identical), but 
clearly in Figure 2 the null hypothesis is not testing for homosce- 
dasticity around the same structural relationship (since the slopes 
in each group are quite different). Consequently, a more sensible 
approach appears to be the following: order the observations in 
accordance with the size of the chosen X; » and omit the middle 
25% (roughly) as before. Now combine (or pool) the observations in 
groups 1 and 3, apply OLS once to the remaining (n-c) observations, 
obtain the estimated errors, calculate the SSR for groups 1 and 3 


(the first and second halves of the sample, respectively), and apply 


116 


the F-ratio as before. This approach has the effect of ensuring 
18/ 


that the same structural relationship exists in both groups.— 


(2) Spearman rank correlation coefficient is a correlation 
coefficient, but in this case it is between the rank of the abso- 
lute value of the residuals from OLS and the rank of that X; 
variable with which the variance might be associated; that is, 
assiqn the value 1 to the smallest absolute residual and smallest 
X; , the number n to the largest residual and X; , and fill n 
the intermediate ranks for both series. The correlation coefficient 
between these two rankings can be tested with the relevant t-test LÉI 
This test can be extended to include the size and not just the order 
of the estimated residuals and Xj'S by computing the correlation 


matrix between the absolute value of the residuals (denoted le; |) 


and the relevant independent variable(s), and applying the same t-test. 


(3) The Glejser regressions extend this idea even further 


by suggesting the estimation of regressions of the form 
š Y 
=a + BX: 
le] =o + 8 j 


where y is a parameter which is the basis for iteration. By 
using a non-linear estimation technique (see Appendix) or, 
alternatively, by giving y systematic values between -2 and +2 


(say), the decision on homoscedasticity of residuals depends on the 


18/ Note that no proof has been supplied for this approach; it 
is only a suggestion which seems sensible. 


19/ See Section 5.3, test (iv). 


117 


statistical significance of the estimated coefficients o and 
B. It is not clear whether the F or t-test should be used for 
this purpose. Since it is the overall relationship, including 
the intercept, which is of interest, an F-test seems the more appro- 
priate EH however, we may also be interested in distinguishing 
pure heteroscedasticity (a = 0, B # 0) from mixed heteroscedasticity 
(a # 0, B # 0), in which case the t-test is the relevant test. 
Johnston notes that there are sometimes problems with this test .21/ 

If heteroscedasticity of one form or another is suggested, 
the problem remains as to how to specify the [nxn] matrix Q 
to be used for the GLS estimates. One commonly used method is to 
place the square of each of the residuals Lex) from an OLS regres- 
sion on the diagonals (leaving zeros elsewhere), but this does not 
necessarily adequately reflect the variance. To get a better measure 
of the variance, moving averages of subsets of ordered squared resi- 
duals are often used. If Glejser equations are estimated and suggest 
heteroscedasticity, for a given y , then the squares of the predicted 
values from the chosen equation can be used as the diagonal terms 
of the matrix (see Section 7.2). 

Alternatively, the original specification might be trans- 


formed in such a way that heteroscedasticity is eliminated. 


20/ Since this hypothesis includes the intercept, it is not the 
F-statistic generated in the standard regression output. 


21/ See Johnston, pp. 220-221. 


118 
For example, if 
y = By + BxXo + u 
and 
E(uu') = o2x2 , 


then define 


HH 
2 
and 
3 = g2 è 
E(vv') = o b 


Consequently OLS can be correctly applied to the transformed rela- 
tionship, where Bj is now the intercept and Bu the slope. 

In general, if E(uu') = gin) , then every independent variable 
including the intercept should be divided by mF and OLS applied 
to the transformed relationship to obtain the BLUE of the para- 
meters. Note that only in the case when y = 2 (as discussed 
above) will there be an intercept in the transformed regression. 


The interpretation of the parameter estimates in the transformed 


119 


regression can be easily traced back to the parameters in the ori- 


ginal specification. 


6.4 Autocorrelation 


This problem arises when the errors (u;) are not independent. 


This assumption was also embodied in assumption (ii), which not 
only specified constant variance, but also included zero covariance; 


that is 


E(u;u = 0 for all i and for all s#0. 


its) 
For time-series this means serial independence of errors and its 
violation is often referred to as serial correlation. It is in this 
context that the problem usually arises, where an error in one period 
is related to an error in a previous period. If it is the error in the 
previous period it is known as first order autocorrelation, that is 


Eu aux) #0 for all t , or alternatively, 
uk = Pug + V, (24) 


where |o|<1]1 to avoid an explosive system and vą satisfies assump- 
tions (i) and (ii) (as in the previous section). 

Since the problem is very similar to that of heteroscedasti- 
city, the consequences and the remedies are the same. The [nxn] 


matrix Q can now be assumed to have some nonzero off-diagonal 


120 


elements (since nonzero covariances now exist), and the most efficient 
parameter estimates are again the GLS estimates given by (21) and 
(22). As was the case with heteroscedasticity, OLS estimates are 
LUE but not BLUE, and the estimated OLS variances are less than the 
true OLS variances ,22/ which in turn are greater than the GLS esti- 
mates. Thus, once again the S; and associated t; estimated by 
the computer provide the analyst with little guide to appropriate 
action. 

The conventional test for autocorrelation is the Durbin- 
Watson test which tests for first order autocorrelation only. 
It is based on the d-statistic defined as 


n 


)2 
ES ! 


d = where 0 < d < 4. (17) 


+ 2 
ER 
A d= 2 means no autocorrelation, a d < 2 , positive autocorrela- 
tion (p>0) and a d > 2, negative autocorrelation (p<0). To carry 
out the formal test, upper (du) and lower (d ) limits for signifi- 


cance levels of d are established. The hypotheses are: 


Hy: p> O E geg 
nyt’ p <0 i Ge s 


22/ See the example in Johnston, pp. 247-248. Section 8-2 gives 
a good account of the consequences of autocorrelation; they 
are not repeated here. 


121 


if d< 2, d is compared with d, and d 
In this case, 


lf. d < d reject H, in favour of Hi: p>O, 


i, a> di do not reject Ho e 
and 


if d < d < d, the test is inconclusive. 


If d> 2 , calculate (4-d) and refer to this value as if testing 
for positive autocorrelation, with the exception that, when (4-d) 
<d , the alternative hypothesis becomes Hı: p < 0. The tables 
which determine the lower and upper significance levels d and 
4 depend upon n and k (actually k-1)23/, and the choice of 
the confidence level is again dependent on the probability of making 
a type I error (that is, the probability of rejecting a correct null 
hypothesis). The inconclusive range results from the fact that the 
errors used in the calculation of d are the estimated rather than 
the (unknown) true errors. 

For large samples (say n > 100), the von-Neumann ratio can 
be used. It is approximately normally distributed with mean 
(2n/n-1) and variance (4n2(n-2))/((n+1)(n-1)?); tests are there- 
fore made with a preselected confidence interval from the normal 
distribution with appropriate mean and variance; that is, calculate 


I= vN - (2n/n-1) 


'4n2(n-2)/(n+1)(n-1)5 


23/ This is denoted by k' in the tables in Johnston, pp. 430-431. 
It is the number of explanatory variables or parameters 


excluding the intercept. 


122 


where vN is the vN ratio defined as 


n 
5 zle; = €;_,)°/(n-1) 
(0j 2. 45 
nee = 
S ei/n 
i=1 


and refer to the standardized normal tables for Z . A high positive 
Z (greater than the critical value) indicates significant negative 
autocorrelation, whereas a high negative Z (less than the critical 
value) indicates significant positive autocorrelation. Note that 


there is a unique relationship between the two tests since 


(nid = (n-1)vN + 


One important warning -- the d-statistic should not be refer- 
red to if there is a lagged dependent variable in the regression 
because it is biased towards accepting the null hypothesis 
of no autocorrelation under these conditions.24/ However, if the 
estimated d is still below d then it is clear that the alter- 
native hypothesis cannot be rejected, even with a lagged dependent 
variable included in the relation. In this case and only in this 


case is a definite conclusion possible under these conditions. 


24/ See Johnston, p. 252 and pp. 307-312. A large sample test when 
lagged dependent variables are present has been developed by 
Durbin. Called an h-statistic, it is related to the d-statistic 
and is tested as a standardized normal] deviate. See Johnston, 
pp. 312-313. 


123 


As with heteroscedasticity, autocorrelation problems can be 
handled in two ways. First,an estimate of the Q matrix can be 
obtained25/ in a GLS framework ,26/ based on an estimate of p 


(denoted 6): 


+ K SW: an-1 
= p p Pn-~2 
en oll l 6 6 
k (25) 
an-1 an= an=3 
p p p 1. 


The implementation of this approach is discussed further in Section 7.2. 
Alternatively, the parameters can be estimated using a transformed 


relationship. Consider the model (where the observations depend on 


time, t ) 


y, Sot BX, + n, bm ke By indy E 


where 


+N 


ys "fe 


and Vt satisfies the OLS error assumptions (i) and (ii). If this 


relationship is lagged one time period and multipled by p , so 


25/ Johnston, pp. 246 and 259. Note that Johnston denotes V = elo š 


26/ See Section 6.3 above and Section 7.2 below. 


124 


that 


pY+_1 =p 0s PBX _ 4 + pus] 


and if this is subsequently subtracted from the original relation 


so that 
(y, - sy. y) = olli + B(x, - ox, J) + v+ (26) 


then the errors in the resulting transformed equation satisfy the 

OLS assumptions and it can therefore be estimated efficiently using 
VER This procedure, often referred to as generalized differencing, 
is easily extended to cover relationships with more than one independent 
variable. 

If p were known, it would be easy to apply GLS or OLS to the 
transformed equation. In fact, various methods have been proposed to 
estimate p and to approximate the GLS procedure using equation (26). 
All of them require an estimate of p . To estimate equation (26), it 
is necessary to compute the dependent variable (yy - Dr and each 
independent variable (x, - 0X4). The estimated coefficients on the 


independent variables provide estimates of the slope coefficients in 


the original relationship, while the original intercept is obtained by 


27/ It should be noted that, as described, this transformed equation 
is not identical to the GLS approach since it is only defined 
for t= 2, ..., T; that is, the information in the first observa- 
tion is 'lost'. It can be shown (see Johnston, pp. 260-261) that 
the equivalent transformation takes this into account by inserting 
the transformed variables YI - oi and VI - ou into the first 
observation before estimation by Di 


125 


dividing the estimated intercept in relationship (26) by (1 - 6)28/ 


(assuming that there is an intercept in the original specification, 
that is, assuming that the default state setting CONSTANT in workspace 
32 REGRESSION is in effect<2/ ie 


There are three common alternative methods to obtain an 


estimate of p: 
1) Calcutates?/ Bet d 


where d is the Durbin-Watson statistic from an OLS regression 


of the original relation. 


2) Estimate the autoregressive relationship (24), that is estimate 


Pink, ” X 


where are the estimated residuals from the OLS regression. 


€ 
t 

The estimated coefficient of €t] provided by this regression 
is then the required estimate of p . Note that this regression 


has no intercept and is estimated on (T-1) observations. 


28/ However, this will not be an unbiased estimate of a. The associ- 
ated standard error can be obtained by dividing the standard error 
of the estimated combined coefficient in equation (26) by (1 - 6). 
See J. Kmenta, Elements of Econometrics, p. 283, 


29/ See Section 4.15 above. 
30/ See Johnston, p. 313. 


126 


3) Estimate a version of the transformed equation (26) with the lagged 


dependent variable on the right-hand side, that is estimate 


gu = all-p) + BX, - oBX,_1 + PYt + V+ (27) 


A 


and use the estimated coefficient on that variable (y,_,) as p 

in estimating the transformed equation (26) as originally specified. 
Again note that this equation is estimated on (T-1) observations. 
Also note that although theoretically the coefficient on Xt (8) 
times that on OR (o) should equal the coefficient on Xt] (oe), 
there is no restriction included in the estimation procedure which 
imposes this on the estimated parameters, i.e. (o)x(B) does not 


Cen 3U/ 


necessarily equal ( 
Since all are only approximations, it is quite likely that 

each of these methods will yield a somewhat different estimate 

of p , although there should not be a large difference. Method 1) 

is the easiest; method 2) is quite easy if a regression program 

that restricts the intercept to zero is available; while method 3) 

is a useful two-stage OLS procedure. The latter two methods can 

easily be extended to higher order autoregressive schemes .32/ Since 

6 is only an estimate, there is no point using more than a couple 


31/ This is because the estimated coefficient of Xt] is not 


(an estimate of p ) times (an estimate of B ), but rather 
an estimate of (p times 8). This point is considered further 
later in this section. 


32/ See Johnston, pp. 263-264. 


127 


of places after the decimal point in the calculations (e.g., 0.23 not 
0.227651). Also remember that all relationships with a one period lag are 
estimated without the first observation (that is, on (T-1) observations), 
unless the relevant data have been explicitly included in the data matrix. 
To assist the analyst with the problem of autocorrelation two 
correction programs have been specifically developed based on the above 
methods. These can be found in the workspace 32 REGRESSION along with the 
REGR program (see page 35). They are illustrated by examples at the end 


of the following brief descriptions of the procedures. 


The COCHRANEAORCUTT Program 


This iterative procedure is based on method 2) for estimating 


p. It involves the following steps. First, the equation 
Ye = ü px, * ut 


is estimated by OLS anda 6 is calculated according to method 2) 
using the estimated residuals from this regression. Then the trans- 
formed equation (26) is estimated by OLS using this estimate of p 
to calculate the transformed variables (or generalized differences). 
This equation yields new estimates of the structural parameters 

(a and B) from which a new vector of residuals based on the original 
(not the transformed) equation can be calculated. These "second 
round" residuals can be used to obtain a new estimate of p , again 


according to method 2) above, and the whole procedure repeated. 


128 


The iterative process may be carried on for as many steps as desired, 
although the usual procedure is to terminate it after a maximum number 
(say 15) of iterations or after the new estimates of p differ from 

the previous ones by less than some prescribed number (say 0.005). It 
can be shown that the procedure is convergent, although there is no 
guarantee that the final estimate of p will be optimal! in the sense 
of minimizing the SSR since the procedure may converge on a local rather 
than a global minimum.— 33/ The Cochrane-Orcutt program implements this 
entire procedure automatically. An internal termination value of 0.005 
on the successive absolute differences in the estimates of p is 
employed, although the user should specify the maximum number of iterations 
desired in the event that this termination criterion is not achieved. 

In most cases convergence is achieved in less than 10 iterations. The 
output is illustrated as Example 5 below. 

Many variants on this procedure have been proposed including 
termination after two steps, termination when the final transformation 
regression shows no autocorrelation (e.g., Johnston, p. 263), and the 
use of 'better' starting estimates for p. For example, if the original 
equation has already been estimated by OLS (which is usually the case 
since that is when the problem is often first discovered), then method 


1) or 3) described above could provide alternative starting estimates. 


33/ This depends on the starting point which, in the procedure described 
above, is os 0 since the first step is to estimate the original 
equation by OLS under the assumption that no autocorrelation exists. 


129 


The latter is the basis of an alternative estimation procedure proposed 
by Durbin and has the advantage that it can be easily extended to higher 
order autocorrelation schemes .34/ 
The HILDRETHALU Program 

This procedure also employs an iterative method of determining an 
estimate of p . It requires that a "grid" of values be specified for 
p and then searches over this grid to find the set of parameter values 
that minimizes the SSR from the transformed equation (26). Any limits 
(within the range -1<p<1) and spacing arrangement for the grid values 
may be chosen and, consequently, by repeating the procedure with 
successively narrower spacing, it is possible to ensure that a global 
minimum is achieved with maximum accuracy. Again, however, a starting 
estimate of p near the final estimate can save time. Consequently, 
the user is asked to submit a value for that d which resulted from 
the OLS regression which suggested significant autocorrelation. Method 
1) above is then used to obtain an initial estimate of p , unless a 
value of -1 has been entered, in which case method 3) is used. The 
program then iterates on p in an effort to obtain that transformed 
equation which has the smallest SSR (or highest R*). Since an iterative 
procedure is used, it is necessary to specify a value by which p is 
to be incremented (or decremented). Note that it is quite possible for 


there to be no minimum within the range -1l<p<1l , in which case the 


34/ See Johnston, pp. 263-264. 


130 


final estimate of p will be given as +0.95 if 0.05 has been defined 
as the increment, +0.995 if 0.005 has been defined as the increment, etc.. 
This is a meaningless estimate of o, 

Examples 5 and 6 provide illustrations of the execution and 
resulting output for each of these programs using the same specification 
and data. Two quarterly time-series have been retrieved from the CANSIM 
Mini Base ;35/ these represent personal expenditures on consumer goods and 
services and personal disposable income, both seasonally adjusted and 
expressed in constant (1971) dollars, for the years 1964 through 1975. 

An OLS regression (using expenditures as the dependent variable) indicated 
significant positive first-order autocorrelation (d = 0.70 in Example 4), 
where, for T = 48 and k = 2 (including the intercept), d = 1.41 

at the 5 per cent level of significance. Note that the estimated value 
of d is entered as an input into the Hildreth-Lu program. The final 
estimated values for p and for the original coefficients (a and 8) are 
very close and the total degrees of freedom is given as 47 because the 
transformed equation (26) is estimated on (T - 1) observations in each 
program. Note that the estimated standard errors for p provide informa- 
tion for another test for autocorrelation (besides the d-statistic) which 
has been outlined above in Section 5.3, test (iv). Note also that the 
resulting d-statistic is fairly close to two, suggesting no autocorrelation. 


This is to be expected, since the d-statistic is biased towards accepting 


the null hypothesis of no autocorrelation whenever a lagged dependent 


35/ This has been described in Section 2.2. Actually three CANSIM series 
were used: D40594 as the dependent variable (denoted cow) while the 
independent variable (denoted PDI) was constructed as 

D40552xD40594+D40254. 


131 


67900°EE28 
IITLSILVLS-4 
Z6SEL° 06 


£90S€ ° ¿1 
gn TVA- 


ELTCBEOBTH'T cesses’ (XZ dO NV3W AHL LV) NOIDVIYVA dO LNATIIAAIOI 


SE806BHTOL‘*O TERA sss ns... OT eseeeae NOSSVA- NISHAQ 
68£8S828€ST61°6SL °° 6699600665 es A RNIASS HHI dO YOHXA AUVANVIS 
E60TE87I00*EEZS (9h ‘T )NOISSHYDIX JO AINVIIAINDIS Od IILSILVIS-d 
ZTCOBEBZZEnHG6°O SE ed RY Gir O bé EI OSZO3NHHO2 
9¿¿L£hhh66°0 5950505555 CS") LNATITAIAOD NOILVTAYYOI JTAILTAN 
OOO0O*SHBO89SEHOHT 8h TVLOL 
LSEBL°TLEOLS 66EHO°COTETS9IZ 9h : TVnqISadg 
TOSSE°8TSZ2LZShHLh TOSSE°8TSZLZShHLh T T X:H40SSHHdD3U 
0O000°LZZS68E99SET T NVAW 
JUVNOS NV3NW Sduvnos dO WNS da NOILVIHYA dO 4J4NOS 
h8800°0 TS708°O £6188°ZST8S T 
EESLE8°SZS T6068 ° HERD WHaL INVLSNOD 
YOu ‘ALS ILN3I91I44302 CALVNILSA NVdn gT8VIHVA 
000S¿°£€9T€S ATaAVIYVA ¿NHQK3dd3Q dO NVAW 
00000°T KTM 
7Z2L66°0 00000°T 


(SHNTVA-Z HLIM) XIYDVW NOILVTAIXYOI 


dddvd NOITV 
Idd 94u NOD 


y eTduexg 





£€Sc€8°66S1 


21ILSILVLS-4 


h6L66°6E 
990S0°9 
ZU TRA A 


132 


8LZ€LZ"9TGÇT SOHHN QHVQNVLS 
B8THSSZ°6SEL SI WYdL INVLSNOD TVNIDIYO 
ss (90-T)0 SI WHEL INVISNOD LJVHL ALON ** 


ESOHSOTTBT°ES eseeee’( JO NVAN AHL LV) NOILVIYVA dO LNAIITAIIOI 


8098060LSE°Z 
T8GGHTLEG8” GLS 


e@eeee 


OE ES SLES s.s ss e s. s TET One RER NOSZVM-NIdunda 
5505655555 gHLVWILSH AHL dO YOHUA AUVANVIS 


hHOTSLZESEB°6EST (Sh ‘T )NOISSHHDHdH JO AINVIIAINDIS YOd IILSILVIS-d 
SHLOEEDZLE°O SSS IOS ONG GES SSNS A SOS SOL Re s... ig) Ze GALIAHYYOD 
ZTBE9THITLE°O seereesccres(2*4) DNAIITAIAOD NOILWIZHXOD HTdILTnH 
6ECER*HOZTESTLTOT Lh TVLOL 
6907L°OLCIEE IZTER ° ZSHZETST Sh TVACISdAYX 
SOZ6S°LLELBELES SOZ6S°LLELBELES I T X:H0SSHHUDHH 
BOOTO*HLETLHSTOST 1 NVGN 
d4avnos NVawW Sduvnos dO WAS dd NOILVIHYVA dO FIXNOS 
Georg OL88L°0 h¿9£h0h°91002 T 
09SS9°€0q 6S£888€ ° ¿tuh WIL INVLSNOOD 
YOudd °` quas L¿N3IQ91I144302 QALVWILSA N VW g4'T8VIHVA 
SEnTE°67278T ATAVIXVA JNAGNAddd dO NRSN 
00000°T h6L66°6E 
€2986°0 00000°T 


(SGN TVA-L HLIM) XIYLVW NOILVTAYXOD 

Yddvd NOITV 

€67802T899°0 d JO ALWWILSY TVNIZ 
T8¿1S6S9n9°0 9 FO ALYWILSH TVILINI 

** SNOILVYRLI Z g3L4V GANIVLEO ZINADUAANOD sx 


Ot SNOILVYALI JO YAEWNN WAWIXVW 
Idd LLAIYOVANVYHIOID NOD 


sS eTduexy 


133 


L¿h£€0°68ST 


IISSILVLS-d 


69298 °6E 
9£S€0°9 
4ñn'TVA- 


ELSOhh ° OZZT Joga qHVqNVLS 
90TO8°S9EL SI WHAL LNVLSNO29 TVNIƏJIYO 
++ (3-T)o SI WHEL LNYLSNOJ LVHL ALON zs 


LHHe6EO0ZET°E "TTT dO NVAW HHL LV) NOILVIYVA dO INIIIIAIIOI 
689HETOIBSE°Z L en... ag eS SERIE LS SHES) SSC T p RE NOSLVA-NIdu¥nd 
ET8IBHTESS° ELS sees eeeseseecscrce*aLVWILSEY AHL dO YOUXA OHPONRÄS 
Q9HOSLOTHED 68ST (Sh ‘T )NOISSJYƏJY JO ZINVIIAINDIS YOA IILSILVIS-d 
O9ST88HBTLE°O eaa s e SES E s s s s s. SS: N A ét GCadLIAUYOI 
6L6L09NZLE°O soeeeeccrcer (td) JNAIITAIIOD NOILVWIZHYOD 3TdILTnHW 
BOSHS*EBTHTTTIOSIT Lh TVJOL 
68E90°9LZ9EE OOSL8°*ZZHZETST Sh TVAGISHY 
COTIB*SSTHSEHES COTIB*SBTHSEHES 1 Tt X:20SSdHD3dH 
SO608°TLSLZOTTSST I NVHN 
SHROOZ NVIW SSRNROOS dO WAS dd NOILVIYVA dO dOHNOS 
8L6T0°O 6S88L°0 OT981°8n661L1 T 
O9E6S°EOn BSEEB*SEHS WAIL DNVISNOOD 
oda “ass DNATITAAIOD GALVWILSA NVIW 4 T8VIHVA 
h¿SS8°99T18L J3T8VIHVA ¿N3QNdd3Q dO NVAW 
00000°1 69798°6€ 
€T986°0 00000°T 


(SHNTVA-L HDIAM) XIYLVW NOITLVTIXYXOD 


YAdVd NOITV 

BE06SLEBOT*O SI d JO YOUNT AYVANVIS 
S0£699°0 SI 9 JO ALVWILSZ TVNIZ 
$Z6n9°O SI d JO ALWWILSA TVILINI 


ss SNOILVYALI Lh g312V Q3NIVIUSO AINADUYGANOD ** 


TO?” OS 

:D 

DNAWAYINI dO AZIS ANV SNOILVYJLI dO YHEWNN YILNI 
STOL’ 

l :D 

NOISSJY9JY SNOIATHd WOH4 NOSLVM-NIQHYNA JALNA 
Idd NIVHLAYATIH NOJ 


9 ərdurexs 


134 


variable is included in the regression equation, which is effectively 
the case in both of these programs. 

Two final notes concerning autocorrelation may be useful. Some 
studies have employed first differences (p = 1) in an attempt to 
eliminate this problem. Substitution of p = 1 into the transformed 


equation yields 
(Ye = Saa) = Bin = Keg) tM 


where v, = Ep "EI: This equation contains no intercept. The 


t 
introduction of an intercept into this equation can only be justified 
by the introduction of a trend (t) variable in the original specification, 


for then 
y+ = a+ BX, t yt +j ey 
so that 
Dr = Ye) = Y + BX, = K...) ty 


This procedure is only justified if the trend specification is valid 
and if Ho: p= 1 cannot be rejected by an appropriate test. 36/ 
Second, as previously noted (see method 3) above), equation (27) involves 


a non-linear constraint on the parameters. To see this write the 


36/ See Section 5.3, test (iv). 


135 


specification more generally as 
Y+ = By + B2X4 S Bai) + BuY t] T V+ 


and note that the transformed version requires that 828, = -B3 . Although 


this constraint is ignored in the implementation of method 3), the equation 


could be estimated directly using non-linear least squares. This method 
is outlined in the Appendix where Example 15 displays the computer output 


corresponding to Examples 4 to 6 above. 


6.5 Misspecification 


This is perhaps the most serious of all of the problems, since 
unlike the previous three problems, OLS in this case results in biased 
parameter estimates .37/ Misspecification refers to an incorrect defini- 
tion of the original relationship. This can take two forms: 

a) Misspecification of the form of the relationship 

(e.g., linear, non-linear); 
b) Misspecification of the content of the relation- 
ship, such as the omission of important variables. 
This is not a violation of any of the assumptions as outlined in Chapter 


III, but rather a questioning of the basic equational form to which the 


assumptions refer. 


37/ Johnston proves this on p. 168. 


136 


The only test for this problem is a well formulated theore- 
tical model of the process being estimated and, naturally, the 
only remedy is to estimate the correct theoretical specification, 
both as to the form of the relationship and to the inclusion of 
variables. This is of little comfort to the analyst, since in many 
cases a well formulated theoretical model does not exist; but even 
if it does, it may not be possible to obtain the data required for 
its estimation. 

However, econometric theory can offer one suggestion regard- 
ing this problem. In particular, it can be shown32/ that the 
inclusion of irrelevant variables is much less serious than the 
exclusion of relevant variables. In the former case the OLS esti- 
mates are unbiased, in the latter they are biased. Thus the 
exclusion of relevant variables from the regression is a very serious 
error. With data and degrees of freedom permitting, the choice 
should be obvious -- all variables that are considered to be impor- 
tant should be included. However, there is no statistical test for 
this problem; the remedy lies in correctly specifying the relation- 
ship in the first place. This entails an examination of the underlying 
theory or institutional behavior in an attempt to specify the form 
of the relationship, as well as the collection of information on all 


the relevant variables. 


38/ See Johnston, p. 169. 


137 


With regard to specifying the correct form of the relation- 
ship, the choice is often between a linear and a non-linear (say 
exponential) form. Often the underlying behavior is not very use- 
ful in this respect and a decision has to be made on empirical 
grounds. If so, the d-statistic of the linear relationship is 
often a good first guide to this problem. A low d suggests posi- 
tive autocorrelation, which in turn might be caused by an inapprop- 


riate (as judged by the data) relationship. For example, 


estimated true 
` estimated 





the estimated residuals might follow the signs indicated in the 
above diagrams if a linear rather than a double log relationship 

(of the form y = Ax?) is fitted. 27 Of course, if the linear is 
the true relationship and a log relationship is estimated, the signs 
will be reversed. If a choice has to be made between the two and 
the choice is not obvious from theory or the d-statistic, the para- 
meter estimates and their significance can be useful. R? (or Rè) 
is not a good comparative statistic since it is measured in terms of 


the dependent variable, which is different in each relationship. For 


39/ Recall result (4), Chapter I. 


138 


the same reason the SEE and CV are also unsatisfactory comparative 
statistics. For this and the other reasons outlined above, the 
problem of misspecification is one of the most difficult of the 


econometric problems faced by the analyst. 


6.6 Errors in Variables 


This problem arises when the X are not fixed; that is, 
when the independent variables are measured with some error. This 
results in a violation of assumption (iii) and is the worst situa- 
tion possible since OLS estimates will be biased and inconsistent! 
(that is, the bias will not disappear if more information -- a larger 
n -- is obtained). This is very worrying because almost all data 
are measured with error. Moreover, as was the case with the previous 
problem, there are no tests (within the econometric model) for the 
presence of the problem and the remedies are often intractable. 
Consequently the problem is often ignored in estimation work. 
Alternatively, it can be argued that even though the data are meas- 
ured with error, the behavior being modelled is based on the measured 
data and that this data should therefore be used in estimating the 


relevant model Zl/ 


40/ See Johnston, pp. 281-282. 
41/ Johnston, p. 283. 


139 


This chapter has presented five of the most important problems 
in practical regression analysis. The next chapter discusses the 
implementation and application of GLS, while the two chapters which 
follow extend the single equation model to include further developments 


in the theory and practice of econometrics. 


140 


CHAPTER VII 
GENERALIZED LEAST SQUARES 


7.1 Introduction 


In the previous chapter the generalized least squares (GLS) 
estimates were introduced as the best, linear, unbiased estimators (BLUE) 
for a single equation where the underlying error structure has the more 


general variance-covariance structure given by 
E(uu') = o2Q 
where Q is an [nxn] symmetric, positive definite matrix. Under 


these conditions the GLS parameter estimates are given by 


B= (x'a xl x'a ly (21) 


with variance-covariances 


var 8 = o(x'a lu") (22) 


which are obtained by replacing o? with its estimate, S? , calculated 


as 





62 = S2 = EQ € r (23) 


To implement these results Q must be given as a prespecified known 


141 


matrix. This chapter outlines some possible uses for these estimators 
commencing with a brief review of the cases of heteroscedasticity and 
autocorrelation discussed in the previous chapter, together with an 
example of its implementation on the computer. The remainder of the 
chapter is devoted to outlining a number of other cases where this 
technique can be usefully applied. The chapter can be omitted without 
any loss of continuity in the presentation of the material in the 


remainder of the book. 


7.2 Heteroscedasticity and Autocorrelation - The GLS Program 


As noted in Sections 6.3 and 6.4 above, the problems of hetero- 
scedasticity (non-constant diagonal elements of Q ) and autocorrelation 
(non-zero off-diagonal elements of Q ) provide 'logical' applications 
for GLS, In practice, however, these problems have tended to be solved 
by transformation approximations to GLS that can be implemented using 
ordinary least squares. These methods have been discussed above in 
Chapter 6. In this section the implementation of GLS as an alternative 
approach to these problems is briefly reviewed. 

Suggestions for ways of specifying an estimated Q matrix to 
overcome the problem of heteroscedasticity have been outlined above 
(page 111). This matrix will contain n non-constant diagonal elements, 
with the elements estimated by one of the previously described methods, 
and zero off-diagonal elements. The matrix can be constructed using the 


procedures outlined in Chapter 2 (especially Section 2.3). It is then 


142 


available for insertion into the GLS program discussed below. 

The case of first-order autocorrelation is somewhat different 
since the estimated Q matrix has a general form dependent only on 
the value of 6 - see result (25) above. Consequently, since many 
users would find it necessary to construct this matrix for different 
values of o, the calculation of Q has been pre-programmed into 
the GLS program. The option of using a pre-specified ĝ (as for the 
heteroscedasticity case) is still available to the user, as is a third 
option of having the package internally calculate a value of o using 


the formula 


n-1 

g son SC 

D = . 
(n - 1)S2 


Example 7 illustrates the use of the GLS program using the 
first-order autocorrelation example employed in Examples 4 te 6 above. 
Note that, in this example, a 6 = 0.65 is used. This was obtained by 
method 1) (see page 125). The resulting parameter estimates are quite 
close to those obtained in the previous examples, as are the equation 
statistics. Note that here 48 observations are used in the estimation 
procedure and, consequently, results that are identical with the GLS 


approximations presented in Section 6.4 are not to be expected. 


1/ See H. Theil, Principles of Econometrics, p. 254. 


143 


hO06SShSSLO° € "TT" JO DRAN AHL LV) NOIZVIYVA dO JNAIITAIZ09 


nHSST9IHOOSE’Z TAST ee ee OE EST ROE ER EENEG NOSIVH=NISznGg 
LSOSZEBHOS°6SL ° 5555955995 sess JIVHILSH FHL JO YOUHA AUVANVIS 
LONTS9CTZH° LOSS (9h “T )NOITSSHYDINY JO ZINVIIAINDIS UOJ IIGSILVIS-d 
S60ZLZ9T66°O ere ego GE EE Ae a s: bé EI GQ3123g43802 
OHS£S08166°0 5500555555 95(*H) NHI9I443029 NOILVIAZUYYOD H'TdILInNW 
LZ80T°EELTOOSTSZE 8h TVLOL 
L8nEL*8E69LS 9OHO8°TSTEES9Z 9h TVnqISg3dH 
LZTZh°L9SS 6ZHEe6’ Z8 GO9OZTZE 6THEB*ZB8EO90ZTZE T T X:40SSHgD3d 
C6E69E°SZITOOLLZEZ Tt NVGW 
IILSILVIS-4 ZgdVnOS NVANW Sduvnos dO WNS da NOIZVIYVA dO SOROOS 
ELEM 1LO0¿10°0 L¿6S6L 0 Sh¿¿6°S969ç T 
9S8EL°9 LtO9Z°6TOT hOBHE*89B9 NN SA LNVISNOOD 
4ñn'TVA- A YOudd “als LNHIO9I44302 GHbVWILSA NVdw gT8VIHVA 
ShH£T6°969në gT8VIHVA INIQANAdJAA AO NYAN 
00000°1 eS9T8°L9 
hO0S66°0 00000°T 


(SHNTVA-ZL HLIM) XITXLVW NOILVTAXYOI 

AAR NOITV 

Sgr (NMONXNN AI 01) 9 HSN 

ON ¿(ON YO SAX) NMONNX XIYLVW ZINVIXVAOI 
Idd ST) NOJ 


L eTduexg 


144 


7.3 Grouped Data 


Condensed summaries or groupings of sample observations, such as 
averages, are often all that is available to the analyst. This may be 
for reasons of confidentiality or convenience, but whatever the reason 
it introduces problems for the application of ordinary least squares. 
Examples of these type of data are common; annual or quarterly time-series 
that are constructed from monthly data, or the classifications of indi- 
vidual survey responses reporting averages by income class, region or 
some other classifier are examples of the grouping of observations before 
estimation. 


By assuming that the multivariate linear model 
y=X8 +u (6) 


applies to the original unavailable and ungrouped data of n observations, 
and considering a known linear grouping into m groups, denoted by the 
[mxn] matrix G , it is possible to examine the consequences of estima- 


tion with the available grouped data 
y = Gy and X = GX 


where y is [mx1] and X is [mk]. For example, if G represents 


a quarterly averaging of monthly data, it will take the form 


1 1 1 
a š < É, Ó pg = 0 
= 1 1 1 
G i ECKE 0 


O 
O 
° 
° 
So « 
CH 
CH 
w IR 


145 


Any type of linear grouping can be captured in a similar manner. Pre- 


multiplying equation (6) by G yields 
Gy = GX8 + Gu 
which can be written as 


(28) 


cà? 


y = 38 + 


where üu = Gu and y and X are defined above. Equation (28) can be 
estimated from the available grouped data, but an examination of the 


variances of the residuals shows that 


~ 


E(uu') = E(Guu'G') = GE(uu')G' = o2GG' 


assuming that u satisfies the usual assumptions (see page 26). This 
demonstrates that the errors in equation (28), which is to be estimated, 


are heteroscedastic with 
Q = GG' 


Since G is a known grouping matrix, Q can be easily calculated and 
GLS used to estimate equation (28). Note that if each group contains 
the same number of observations (as is the case in the quarterly averag- 


ing example above), GLS reduces to ordinary least squares since  GG' 


146 


2/ 


(and hence Q and cl is a diagonal matrix with constant elements.— 
Note also that the R* computed from grouped data can be a misleading 
indicator of (that is, higher than) the R? relevant to the ungrouped 
data, especially if the method of grouping has been chosen to maximize 
the between-group variation compared with the within-group variation.2/ 
This results from the fact that the grouped observations tend to be 
much less dispersed around the estimated regression than the original, 
or ungrouped, observations. For this reason increasing the amount of 
information by going from a one-way to a two-way grouping (or cross- 
classification table) does not change the parameter estimates but does 
change the estimated standard errors (which are more efficient) and 

pz .3/ However, the introduction of grouping by more than one independ- 


5/ 


ent variable using only one-way classifications can lead to problems.— 


2/ For example, consider the simple case where 6 observations are 
grouped by averages of successive pairs of observations. Then 


1 Zone 


+ 
2 2 0 2 0 o0 
1 1 1 1 
G = = = t = = = = 
CR EE EK and GG d > I, 
x A £ 
0 00 -Q "O 2 2 0 O 5 


where I, is an identity matrix of order [3x3]. Consequently, 
oe = o2GG' = d gt: which is homoscedastic. 
3/ See Johnston, pp. 230-232. 


4/ This case is examined by J. Kmenta, Elements of Econometrics, 
pp. 329-335. 


5/ See Johnston, pp. 234-236. 


147 
7.4 Stochastic Prior Information 


The introduction of outside, or extraneous, information was 
suggested as one means of overcoming the multicollinearity problem (see 
Section 6.2). In that context, a prior fixed estimate of one (or more) 
of the regression parameters was incorporated into the estimation 
procedure. Seldom is such information available with certainty - often 
it is subject to a sampling variance of one kind or another, as is the 
case with the parameter estimates obtained from previous empirical work. 
This can be called stochastic prior information. 

This kind of prior information can be incorporated into the 
estimation procedure®/ by considering g independent linear constraints 


which are stochastic. These can be expressed as 
r = RG + v | (29) 


where rods [gx1] vector of prior parameter estimates 


x 


R is a [gxk] matrix of known coefficients 


and v is 


Du 


[gx1] vector of errors 


with 


E(v) = 0 and denote E(vv') =y 





6/ The use of prior information in the estimation procedure is the 
subject of Bayesian econometrics. See, for example, A. Zeliner, 


An Introduction to Bayesian Inference in Econometrics. 


148 


7/ 


where j is a known [gxg] matrix. Note that (29) also can include 


prior information on linear combinations of the parameters.— 


where w 


and 


The 


H 


8/ 


8/ 


Equations (6) and (29) can be combined and written as one equation 


" 
WD 
+ 
= 


(30) 


o7In 0 
and E(ww') = e 
p 


E(w) 


" 
CH 


resulting variance-covariance matrix of errors, which is square of 


Johnston, p. 222, provides the example ER d + v. where E(vp) =0 


2 h 
EN Le ; de 3 
and E(vh) e SÉ that 6; + 8; gives the range 7 to A and 


B; + Zoe gives the range 0 to 1. Inequality constraints can also 
J Bj 


be viewed in this way. J. Kmenta, Elements of Econometrics, p. 435, 


notes that if all values of d within a prescribed interval 
a < d < b are considered equally likely, then d = (a + b)/2 + Vh? 
where Vh has the continuous uniform distribution with d = 0 


and E(vh) = (b - a)?/12. This formula shows quite clearly that the 


more strongly the analyst believes in the prior information, the 
smaller will be the diagonal elements in yp. 


This subject was first introduced in Section 4.15. 


149 


order (n+g), includes matrices of zeros in the upper-right and bottom-left 


which reflect the independence between the sample and prior information, 
and is almost certainly heteroscedastic since even if y is a diagonal 
matrix it is unlikely that the variances on each of the g pieces of 
prior information will be the same (that is, it is unlikely that y 

is homoscedastic) and only by coincidence will these variances be the 

2 


same as o Consequently, equation (30) must be estimated by GLS. 


It can be shown that the resulting estimators are a weighted average of 
the two types of information, namely the sample and prior information.2/ 
One difficulty with implementing GLS in this context is that it involves 
an estimate of o. The most easily justifiable estimate is obtained by 
applying ordinary least squares to the sample data alone ,19/ but it is 
also possible to iterate on S* using successively obtained parameter 
estimates to generate new vectors of residuals and applying result (23). 
Theil has proposed a test for examining the compatibility of the 
Ty 


sample and prior information.— If u and v (and hence w ) are 


normally distributed, the scalar 
Zut 2 incl EAR — A 
(r - RB)'(oPR(X'2 X) R' + p) (r - RB) 


where ĝ is the GLS coefficient estimator (21), has a x? distribution 


with g degrees of freedom under the null hypothesis (30) that.the sample 


9/ See Johnston, pp. 222-223. 


10/ Johnston, pp. 226-227, suggests this procedure. 
11/ See H. Theil, Principles of Econometrics, pp. 350-351. 


150 


and prior information are compatible with each other. This magnitude 
can be quite easily calculated using the data manipulation operations 
outlined in Chapter II (see, especially, Sections 2.3 to 2.6). The 
relevant table of critical values can be located in most texts (see, 
for example, Johnston, p. 427) and the test follows the same procedure 


as outlined for the t and F-tests in Chapter V. 


7.5 Linear Constraints 


As set out in Section 4.15 and noted in Section 7.4 above, prior 
information in terms of fixed linear constraints can be handled relatively 
easily within the ordinary least squares framework. In this section the 
general case is briefly reviewed and extended to GLS using the notation 
introduced in Section 7.4. In this case, equation (29) holds without 
error and, in the case of ordinary least squares, the SSR of equation (6) 
must be minimized (or the relevant likelihood function maximized) subject 


to this constraint. This results in an estimator which can be written 
as12/ 


B= B+ Deal R'(RO(X'X)”] gilt. RÈ) (31) 


where B is the unconstrained ordinary least squares estimator (result 
(8)). Once again, the right-hand side of result (31) can be calculated 


using the data manipulation operations outlined in Chapter II. 


12/ See Johnston, p. 158 where the associated variance-covariance matrix 
is also presented. 


151 


The comparable GLS estimators can be obtained by replacing (X'X) 
with (X'Q'lx) in result (31) (and the associated variance-covariance 
matrix).13/ Note that many of the results in Sections 7.4 and 7.5 
involve exactly the same matrix combinations so the analyst should be 
careful not to duplicate the operation if more than one result is to be 
used. 

The joint significance of the g linear restrictions (result 
(29) with v = O ) can be tested by an F-test (similar to Section 5.5) 
which compares the SSR of the constrained regression with that of the 


14/ 


unconstrained regression. The relevant test statistic is— 
F(g, n-k) = (r - RB)! (teur R') 1 (r - RB)/gS? 


where 8 is the GLS coefficient estimator (21) and Si is defined in 
result (23). Note that the degrees of freedom for the numerator equals 
the number of constraints imposed and that this formula applies to the 


ordinary least squares case with Q = I where I, is an identity 


n? n 


matrix of order [nxn]. 


13/ See Theil, op. cit., p. 285 (where his Vv" is equivalent to Q”, 
see Theil, p. 278). 


14/ See Theil, op. cit., p. 314. 


152 


7.6 Grouped Equations 


Up until this section, the concern has been with the estimation 
of the parameters of a single equation. This section introduces the 
concept of a group of equations and discusses the joint estimation of 
the associated parameters. Further consideration of this topic is then 
left until Chapter IX. 

Consider m equations, each following the specification of 
equation (6), but each with different dependent variables and different 


matrices of independent variables; that is 


yy = Sp + ü 
Yo = XB + u, 


(32) 
Yn " X m P Ua 
where 
Yq Yoo seee > Yp are each [nx1] vectors 
Xs Xo s... a Xņ are each [nxk] matrices 
Eu, Bos sees > Bn are each [kx1] vectors 
and 


ve Ups see) Un are each [nx1] vectors. 


153 


This system (32) can be written as 


J | Ss AR e aas H By | + | uy 
(o 0 0 Xn Bm Un 


or, alternatively, as 


y=, XA + ü (32) 
where y is of order [nmx1] 
X is of order [nmxmk] 
B is of order [mkx1] 
and u is of order [nmx1] . 


Now E(u) = 0 and denote E(uu') = £ which is of order [nmxnm]. 

This variance-covariance matrix of errors reflects the error properties 
of each individual equation (in the m [nxn] blocks on the diagonal 
of + ) and any contemporaneous and lagged covariances between errors 
in any pair of equations (in the [nxn] blocks on the off-diagonal 
matrices of x ). It is highly unlikely that Y£ will be 
homoscedastic and, consequently, equation (32) must be estimated 

by GLS. The relevant formulae are given by results (21) to (23) 

with Q replaced by x. This is often referred to as the 


method of estimating seemingly unrelated regressions. All of 


154 


the results of Sections 7.4 and 7.5 can also be carried across into this 


context although, needless to say, simplifying assumptions are usually 


invoked to make the solutions computationally tractable. 18/ It can be 


shown that the system can be efficiently estimated by using OLS equation 


by equation only if there are no cross-equation covariances, or if 
16/ 


N = No E s.m ZS 


recourse to the use of errors from a prior OLS regression) and GLS 


Otherwise YX must be estimated (usually by 


applied to obtain an efficient estimate Ê and associated variances 
and covariances. 

A somewhat different application of the same technique is encounter- 
ed when the analyst is faced with a time-series of cross-sections or a 
cross-section of time-series, such as observations on a number of individ- 
uals (households, firms, provinces, etc.) over time. For example, in 


equation (32), each of the equations may represent a cross-section and 


15/ One such assumption (see Johnston, p. 239) is 
E(u;u;) = 0,41 — S 71. 2. A m) 

which states that the disturbance term in any single equation (i=j) 

is homoscedastic and non-autocorrelated (but it can be different in 
different equations), and that (for i#j) there is a non-zero 
correlation between contemporaneous disturbances but zero correlations 
between all lagged disturbances. This assumption permits X to be 
written as the Kronecker product 


Sr: SZ I, 
and hence 
sol = sr? O L 


1 


which replaces Q `° in all the above formulae. 


16/ See Johnston, p. 240 who leaves the proof of the second proposition 
to an exercise. 


155 


such cross-sections may exist for time periods 1, 2, ..., m. Alterna- 
tively, each of the equations may represent a time-series with such 
time-series existing for individuals 1, 2, ..., m. The following 
technique is particularly useful when there are insuffient 
observations in either one of these dimensions LI The combination 

of these two types of information is commonly referred to as pooling 
(see page 95). 

Pooling introduces a dimension of complexity into model speci- 
fication - this time with respect to the error structure which now 
comprises time-series and cross-section related factors, as well as 
the combination of the two. A generalized XZ matrix can capture these 
interrelationships but, as with the case of seemingly unrelated regressions, 
simplifying assumptions are usually invoked to make the solutions computa- 
tionally tractable. 18/ As with the previous case, the system can only be 
efficiently estimated by using ordinary least squares equation by equation 
if there are no cross-equation covariances, or if X; = X; Ree = Xn" 

One such application has been examined above (see Section 5.5, test (iii)) 


and it should be reviewed before considering the more complicated applica- 


tions of GLS. 


17/ Two practical Canadian examples can be used to illustrate this point. 
A cross-section using data from the 10 provinces in Canada is usually 
ruled out because of insufficient observations, while a time-series 
analysis of the coefficients in the recently released 11 input-output 
tables (for 1961 to 1971 inclusive) would be difficult for the same 
reason. 


18/ A useful review of some of the alternative assumptions can be found in 
J. Kmenta, Elements of Econometrics, Section 12-2. 


156 


In reviewing some of the numerous applications of GLS, this chapter 
has attempted to encourage the analyst to be somewhat innovative in the 
specification of a problem. The same general estimation program (outlined 
in Section 7.2) can handle all of the cases described in this chapter if 
they are correctly specified. Moreover, the data manipulation operations, 
outlined in Chapter II, can also be used to advantage. Prediction problems 
with GLS are considered in the next chapter and further analysis of systems 


of equations is presented in Chapter IX. 


157 


CHAPTER VIII 


PREDICTION AND DISTRIBUTED LAGS 


8.1 Prediction with OLS 


Often one of the objectives in the estimation of an econometric 
equation is the ability to use that equation for purposes of prediction 
(or forecasting). In the case of time-series analysis, this usually 
takes the form of predicting into future time periods, while for cross- 
section analysis it often takes the form of predicting behaviour over 
ranges of the independent variables which were not included in that 
sample of observations initially gathered for the estimation procedure. 
The following discussion is applicable to both of these cases. 

To obtain a predicted value for the dependent variable, a complete 
set of observations on all of the independent variables appearing in the 
equation must be obtained for the observations in the prediction sample. 
The resulting prediction is then conditional upon these values of the 
independent variables. This often poses a problem, since in order to 
obtain predicted values for the dependent variable using the estimated 
equation, the analyst must first obtain predicted values for the indepen- 
dent variables. These may be available from some outside source (such as 
an econometric model) or they may be determined by institutional changes 
(such as a change in tax rates), but more often than not the only avail- 
able method of obtaining them is "guesstimation" by the analyst. 

Assuming that the values for the independent variables are avail- 


able for the prediction period, they can be arranged in the same order 


158 


as the variables in the original regression (and the X matrix) and 


1/ 


denoted by a new matrix— 


X = [1 X 


p X 


J 


2p ——— Xkp 


where the subscript p stands for "predicted". Note that this matrix 
is of order [fxk] where f is the number of observations (or periods 
in the case of time-series) being predicted (or forecasted). When f=1 
the equation is being used to predict a single value of the dependent 
variable. 

Given the estimated parameters (ë) obtained from estimating the 
equation by ordinary least squares over the sample period (see result 
(8) in Chapter III), a point prediction of the expected value of the 


dependent variable can be easily obtained as 


H = XÓ. (33) 


Note that since ĝ is [k+1], A is of order [fx1]; these are the 
f predicted values of the dependent variable. 

The analysis should not end here, since a confidence interval 
should be given with every prediction. Here the analyst is faced with 


a choice between 
(i) obtaining a confidence interval for the mean value of y associated 


with the given Xp » or 


1/ Johnston uses Xo on page 38, c' on pages 126 and 153, and xo on 
page 212. See also footnote 3 in this chapter. 


159 


(ii) obtaining a confidence interval for an individual Yp value 


associated with the given X, 


The latter is very useful when the actual Yp is available and 
the analyst is interested in whether the partitioned matrix of 


observations in the forecast period, namely 


Lyp: Xp] 
belongs to the same linear structure that was estimated over the 
sample period. The confidence interval associated with (ii) will 
always be wider than that associated with (i). 
2/ 


It can be shown— that the estimated variance associated 


with the point predictor (33) is 


var (X 6) š SCHT (34) 


p 


where S is the SEE defined in Chapter III (result (10)). This 
variance-covariance matrix is of order [r< fJ.3/ Consequently 
an approximate confidence interval for the mean value of y associ- 


ated with a given d (case (i) above) is 


y CAT, ‘ok las 
p X (x X) x, (35) 


2/ See Johnston, p. 153. 


3/ Note that Johnston defines c as [kxf], so that when f=1 this 
is a column vector (p. 153). Consequently c' is equivalent 
to X, in the notation used above. 


160 


where t' is obtained from the tables of the t-distribution (with 
(n-k) degrees of freedom). Note that e , S and (x'x)7! are needed 
from the estimated equation in order to calculate results (33) through 
(35). 

If the actual yp is available and the analyst wishes to 
see whether it lies in the confidence interval obtained from the 
estimated equation (case (ii) above), then the appropriate esti- 
mated variance of the discrepancy between the predicted and actual 


4/ 


values is— 


2 
SE, 


-1 
+ X. (NK) "a 
piX'X) X5) 
where I, is an identity matrix of order [fxf] , and the appro- 


priate confidence interval is therefore 


v + t's I + X AS Ach A e 36 
p i s X arte (36) 


Note that in both (35) and (36) above it is only the diagonal elements 
(the variances) of the [fxf] variance-covariance matrix which are used 


in the calculation of the magnitudes of the error bands. 


In order to explicitly test whether the new set of observa- 


tions 


4/ See Johnston, p. 154, for the case when f=1. 


161 


come from the same structure that is presumed to have qenerated 


the n sample observations, calculate the f values of2/ 


*p 7 °p 
t = —— | (37) 
S ' pee 2 
VT. + X (X'X) "1 
H r Y 


Note that these t-values test each observation separately. When 

Tel , da and one t-value is calculated. For f> 1, it is 

also possible to test whether all the f observations together 

come from the same structure that generated the sample observations. 

This is a joint (F) test which involves a comparison of the SSR 

obtained from the sample period estimation on n observations 

(denoted SSR, ) with the SSR obtained when the additional f 

observations are added and the equation estimated on the pooled 

sample of (n+f) observations (denoted SSR +f ) . The F-statistic 

el 
qta ee, (38) 

SSR, /(n-k) 
Note that the pooled equation imposes the same structure on the addi- 


tional f observations as was assumed to have generated the first 


n observations. Hence f restrictions are imposed relative to 


5/ The relevant test here is a two-tail test. This is analogous 
to t-test (iii) discussed in Section 5.3. 


6/ See Johnston, p. 207. Also see Chapter V, Section 5.5, for a 
review of F-tests. 


162 


the original regression on n observations, and as a result f 

is the appropriate degrees of freedom for the numerator of the F- 
test. Alternatively, this can be calculated as the degrees of 
freedom associated with SSR +f minus the degrees of freedom asso- 


ciated with SSR, , Or 
(n+f-k)-(n-k)=f ° 


Finally, remember that since f is likely to be small, the criti- 


cal value of F will tend to be large. 


The PREDICT Program 


The PREDICT program, available with REGR in the public 
library workspace 32 REGRESSION, calculates. results (33), (35) and (36) 
above for a given t' specified by the user. In order to perform 
the calculations, the program requires those values for 6 ç Š 
and (x'x)7! which resulted from the estimation of the original 
equation with n observations; only t' and gi must be specified 
by the user. To obviate the need for direct specification of the 
information required from the original estimation period, the pro- 
gram must be executed immediately following the estimation (by means 
of the REGR program) of the equation which is to be used for 


prediction. 6, S and Goen" are then automatically retrieved. 


163 


88ET6° 80S 
TSEST*OnHTT 
Lh8TT°LSST 
LIHIT udddn 


T6GLH’°HLT 
9ZhSE’TZ8 
L6E80°OS2T 
LIHIT Ydddn 


ECEOS* TCT €7TLO6°S9C_ 


8S8720°LSL S9E06°LI9E 
6LLSL°E9TT TELEE°OLL 
aALYNILSA LNIOd LIKIT YahOT 


SZ WOT TWAXYALNI AINAACTANOID 


£€€0S° ter €L£92S°89 
8S820°LSL 0620L°269 
6LLSL°E9TT TITER” LLOT 
QLVHILSA LNIOd LIFIT Y4M4O0T 


tX JO ANTVWA NYAN YOR TVAXEXINI AONYAACTANOD 


€L9°T 
: SU 
*Hoqadud4 JO GA42RIMO 9S HOT 2ñTVA-¿ RSA 
h SST TZZ 6T LB HOTTIE ZY 
SU 


*SITaVIYVA INJANJAJANI ¿ dHL dO HOVE HO/ SANTVA UAINE 


L¿OIG34d 


8 eTduexg 


164 


The results presented in Example 8 (page 163) were obtained 
following the ordinary least squares analysis performed in Example 
2 (page 47). Observations for the population and GNP for three 
more countries have been gathered; the object is to predict the 
size of armed forces of those three countries. These could be 
used to discover whether these observations can be assumed to have 
been generated by the same structure which generated the original 
n observations. 

The three new values (note that f=3 here) for population 
are 164, 87 and 19, those for GNP 221, 155 and 4. A 90% confidence 
level has been chosen (t'=1.673). Note that the confidence inter- 
val for an individual He value (case (ii) above) is wider than 
that for the mean value of Yp (case (i) abọve). As mentioned 


above, this will always be true. 


8.2 Prediction with GLS 


It can be recalled that problems for which generalized least 
squares (see Chapter VII) is the appropriate estimation technique 
can sometimes be transformed into an exact, or approximately exact, 
specification which enables ordinary least squares to be applied to 
obtain efficient parameter estimates (see, for example, Sections 6.3 
and 6.4). In these cases the procedures and estimation program 


described in Section 8.1 above are appropriate providing that values 


165 


in X, refer to the appropriately transformed variables. The result- 
ing Yp (and associated standard errors and confidence intervals) then 
refer to the transformed dependent variable from which point estimates 
for the original dependent variable can be obtained. However, it is 
usually extremely difficult, and sometimes impossible, to calculate the 
appropriate confidence intervals associated with these point estimates 
since the error structures in the prediction are related to those in 
the sample and, possibly, to one another. 

Under these conditions it might appear that result (33) could 
be used to obtain the generalized least squares point prediction 
vector where Š is defined in result (21). However, this predictor 
neglects the error component of Yp which is correlated with the error 


from the sample. Denote this correlation as 
E(uu!) = 
(uu) r 


where g is of order [nxf]. It can be proved_/ that the "best? 


predictor under these conditions is given by 


A A ' DN A 
= + Šš 
Yp X 6 g'o (y - XB) (39) 


where 8 is the GLS estimator (result (21)). Note that this predictor 


includes an addition (to X Ê) to account for the unknown prediction 


7/ See Johnston, pp. 212-213 and H. Theil, Principles of Econometrics, 
p. 280. 


166 


errors which are estimated by making use of Q along with the sample 
errors. The associated variance-covariance matrix is very complicated. &/ 
The difficulty with implementing result (39) is that there is no way 

of obtaining an estimate of z since an estimate of d (the vector 


of errors in the prediction) is not available. 


8.3 Distributed Lags2/ 


The response of a dependent variable to a change in one of 
the independent variables may not be immediate and complete. The 
effect may take some time before it manifests itself, and the re- 
sult may be a delayed response that is experienced over a number 
of different time periods; i.e., the effect is distributed over 
several time periods. This behaviour in time-series analysis is 
captured by means of distributed lags. 

A lagged variable is one for which the value for a current 


period was determined in an earlier period. The notation employs 


8/ See Theil, op. cit., p. 288. 


9/ See Chapter IV, Section 4.12, for the construction of lagged 
variables. Note also that the terms lag distributions, lag 
schemes and lag patterns are often used interchangeably. 


167 


a subscript t-j , where j indicates the length of the lag in 
terms of the number of periods (quarters, years, or whatever). 
Thus, for example, Yen] represents the dependent variable lagged 
One period, while Kat is the independent variable Xo lagged 
three periods. Since a lagged effect is often not confined to one 
period, a number of such variables may appear on the right-hand side 
of an equation. In the remainder of this chapter, the dependent 
variable y is specified to depend upon a single independent vari- 
able x via a distributed lag with the effect commencing in the 
current time period. Additional independent variables, both lagged 
and unlagged, can be added to the equation as required, and allow- 
ance can be made for distributions that do not commence at period 
zero. 

Suppose that the causal variable x results in an effect 
BoXt on y in the same period (ie, on Y4 Je This is the 
immediate (or impact) effect. However, it may also result in an 
effect BX on y in the following period Lis and an effect 
SCH On Yit? ° and so on. Alternatively, Vy depends on the 
effects of x in periods t, t-l, t-2 and so on. If this distri- 
bution of effects is constant over time, the value of y in any 
period may be specified as a linear function of the present and: 


some finite (say T) number of previous values of x, name1y10/ 





10/ Remember that more independent variables can be added as required. 
T represents the terminal time period. 


168 


gett Te Nae ku C saat gee Ne 


which may be written as 11/ 
T 
Mier se pa e sie 


The 8; are often referred to as the weights or lag coeffi- 
cients. Note that there are T+2 parameters to be estimated in 
equation (40), leaving n-(T+2) degrees of freedom. If T is large 
this number may be disconcertingly low. Moreover, although equa- 
tion (40) can be estimated using ordinary least squares, the various 
lagged values of x are likely to be highly correlated, resulting 
in large standard errors on the estimated B; and the consequent 
difficulties associated with the inference of anything useful from 
them. This is the problem of multicollinearity.12/ 

This problem can only be overcome by the incorporation of 
additional information into the estimation process. In this par- 
ticular application the additional information takes the form of 
the a priori imposition of various assumptions concerning the rela- 


tionships between the weights 8 through B+ - This has the 


o 
effect of reducing the number of independent variables in the equation 


11/ In this notation x,_, is equivalent to x, . 


12/ See Section 6.2. 


169 


and hence the number of parameters to be estimated (thus reducing 
potential multicollinearity problems and increasing the degrees of 
freedom). The most common such assumptions are discussed in the 


following sections. 


8.4 Distributed Lags - the Geometric Pattern 


In the case where all the B; weiqhts are positive and 
their magnitudes decline monotonically over time, it often makes 
sense to impose a geometric pattern on the weights. This speci- 


fies that the weights are related in the following way: 


Bj = kad were (41) 


where k is some positive constant which is to be determined. 


Hence 


By =k 
By, ski = AB, < Bo 
B2 = 


kA? = dBi < Bi 


and so on. Given estimates of the two parameters in (41) (k and À) 
the complete set of weights can be determined. This suggests that 


the problem has been reduced from estimating (T+2) parameters to 


170 


estimating three parameters (a, k and A). Thus, if T>1 , there 
is always a gain in degrees of freedom and a reduced chance of pos- 
sible multicollinearity problems. 

How then are these parameters to be estimated? Substitution 


of (41) into (40) yields 


(42) 


Note that T must be replaced by infinity (el since the geometric 
scheme (41) never actually becomes zero. However, this is not an 


important restriction, since if À is small (e.g., 0.1) the weights 


Bj approach zero very rapidly. Now apply a Koyck transformation A 


to (42); that is, lag the equation one period and multiply by A 
to obtain 


S z a" 
Ar 3 ha + k > À Xt-(j+1) + WIR 


J=0 


which is equivalent to 


A4 = da + k ° x 
a= 


+ Au (43) 


t-j t-1 


Subtraction of (43) from (42) now yields 


s: o 
Ye = Au Fa - Ant A Sea + Vie = Aus y) 


13/ This is also used in the section on autocorrelation - see 
Section 6.4. 


171 


which can be rewritten as14/ 


y, = a(1-A) + kx, + Aye t v (44) 


t 


where Me m uye WIR , This equation, which involves a lagged 
dependent variable on the right-hand side, can be easily expressed 


as 
Yt =at bx, + Cy+_1 + Vt 


where the parameters a, b and c can be estimated using ordinary 
least squares. This estimation yields three parameter estimates, 
but these estimates are not exactly the three required parameter 


estimates. The required estimates are obtained as follows: 





À = C 

k=b 

a w 
1-c 


The standard errors of c and b therefore correspond to the 
standard errors of h and k , respectively, but note that the 
standard error of a must be approximated since G is a non- 


linear function of a and c .15/ Given the above estimates of k 


14/ See footnote 11. Note that in theory if the oriqinal error 
term u, was not autocorrelated, then the error term V+ will 


be first order autocorrelated, whereas if u, was first order 


autocorrelated, then Vt will not be autocorrelated. See also 
footnote 19. 


15/ See J. Kmenta, Elements of Econometrics, p. 283. 


172 


and à , estimates of all of the 8; may then bë obtained from 
(41). These weights will exhibit a geometrically declining pattern. 
The speed of decline will depend on the magnitude of X . A high 
h (e.g., 0.9) results in a slow decline, while a low d (e.g. 
0.1) leads to a fast decline, as is illustrated in the following 


table and diagram for two arbitrary values of k: 


Weights 





6; Diagram for k=] 





173 


A low h therefore leads to more myopic behaviour since 
the weights on past values of x decline rapidly, so that the 
behaviour of x more than roughly three time periods in the past 
has little effect (relative to the more recent behaviour) on the 
present value of y (i.e., TEC To demonstrate this, note 
that the individual 8; weiqhts can be expressed as percentaqes 


of the total (or sum) of the 8; , denoted 8B: 


B = > B; 2 (45) 
j=o 
That is, define the normalised weights17/ as 
VE 
s 6 
so that (46) 
e fe: Me 


The VS (multiplied by 100) give the desired percentages. Now 


the geometric pattern is specified in (41) so that 


= d 
B; kà Bw; š 


16/ This is a general statement - the absolute effects depend 
on the units of measurement and the estimated value of k. 


17/ Weights that sum to 1 (100%). Since 8 is a constant, summing 


both sides of (46) yields SG 8; =g E Wj which means that, 
=0 j=o 
in order to be consistent with (45), ZS W; = 1. 
j=o 


174 


However, to satisfy the normalisation requirement ,18/ 

wj = G-A (47) 
so that 

k = B(1-A) 


and an estimate of the sum of the weights (or lag coefficients) 


is given by 


B= ko a (48) 
li 





Since this involves a non-linear combination of the estimated para- 
meters, it is not possible to obtain an exact estimate of the standard 
error of B in this case (see footnote 15). 

Given an estimate of À , it is result (47) that expresses 


the individual weights as a percentage of the sum of weights. The 


application to the previous table now produces the following: 








Normalised 
Weights 










0.9000 
0.0900 
0.0090 
0.0009 








1.0000 1.0000 1.0000 1.0000 


18/ See previous footnote. Note that the sum of the geometric prog- 


ression > Sa? N 
J=0 1-4 





175 


Note that with a low s of 0.1, 99.9% of the total is captured 
with the first three weights, while with a high N of 0.9 only 
27.1% of the total is captured by the same number of weights. 

In summary, when all the B; weights are positive and de- 
clining,a geometric pattern can be imposed on the weight distribution, 
serving as the additional information needed to overcome the degrees 
of freedom and potential multicollinearity problems. This results 
in an equation in which there are only three coefficients to be 
estimated and which includes a lagged dependent variable on the 
right-hand side19/ (result (44)). The estimated coefficient on 
this variable reflects the behaviour of the lag distribution, with 
a value close to zero indicating myopic behaviour. For the geo- 
metric distribution interpretation to be valid, the estimated 
coefficient must be between 9 and 1. The percentage distribution 
(result (47)), the individual 8; weights (result (46)) and the 
sum of the lagged coefficients (result (48)) can all be obtained, 
as can the intercept of the original equation. 


20/ 


Both the advantages and disadvantages of this approach 


lie in its simplicity. It requires that all the weights be positive 


19/ Remember that under these circumstances the Durbin-Watson 
statistic is biased gn he accepting the null SE ER of 
no autocorrelation i (see Section 6.4). tne vt are 
non-autocorrelated Pen ordinary least ee will produce 
biased estimates in small samples, but if the ut are non-auto- 
correlated (which means that the vt are autocorrelated) then 
least squares estimators are inconsistent. See Johnston, pp. 
304-309. For a description of inconsistency, see Section 8.1. 
Also see Section 9.2 for a further elaboration, and footnote 
24 of Chapter VI. 

20/ This formulation can also be justified by reference to economic 
models, particularly the partial adjustment or adaptive expec- 
tations models. See Johnston, pp. 300-303. 


176 


and declining, and this may not be a reasonable assumption. More- 
over, if there is more than one lagged independent variable, this 

scheme forces the same percentage distribution (that is, value of 

` ) on the lag schemes of all of the independent variables (other- 
wise the Koyck transformation is not easily applied). These 


restrictions have led to the development of a much more flexible 


lag distribution scheme which is described in the next section. 


8.5 Polynomial Distributed Lags (Almon Lags) 


This scheme simply assumes that the B; weights follow 
a polynomial of some specified order. This means that, for example, 
the weight pattern may first rise and then fall. It also means 
that negative weights are permitted and that different weight pat- 
terns can be imposed on different independent variables. As in the 
preceding section, the development is presented in terms of a lag 
scheme on a single independent variable -- generalisations will be 
mentioned at the end of this section. 

If the order (or degree) of the polynomial to be imposed 
is denoted r , then the B; weights in (40) are determined accor- 


ding to the polynomia121/ 


8; =a T Bad. kalea a. (49) 


217 Johnston, p. 295, uses somewhat different notation. The nota- 
tion in this section does not follow that of Johnston. 


177 


for j=0 to T , So that 


B > 2 


o o 
= + + +... + 
By ao a, ER a, 
r 
Bo = ao + 2a; + 4a. + ... + 2 a. 
r 
= + + Tas be + 
By a, Ta, as T a, 


which can be written in matrix form as 


b = Ka (50) 


where b is the [(T+1)x1] vector of lag coefficients and 


SN 0 
(RK 1 
e EE di 
r T W T 


This is a [(T+1)x(r+1)] matrix of coefficients completely specified 
by the length of the laq (T) and the order (r) of the polynomial. 
The vector a contains the rtl parameters which are to be estima- 
ted. Since r is usually less than or equal to 4, this means that 
up to 5 parameters must be estimated. Note that it is sensible to 


impose (49) on the original specification (40) only if there are 


178 


less 'a' parameters to be estimated than there are B) parameters 
in the original specification. Also note that it is only the order 
of the polynomial (or the number of possible turning points) that 
is imposed; nothing concerning the signs or rise and/or fall of 
the weights is imposed. The exact shape of the 8; distribution 
is determined by the estimates obtained for the a parameters in 
(49). 

To see how estimates can be obtained for these parameters, 
substitute (49) into (40), so that 


T 
= * :2 ail? 
VL = @ * > (a, taij taj?’ +... taj ej + u, 


j=o 
and expand the right-hand side, taking the constant parameters 


outside the summation signs, to obtain 


T T T A 
zata- Eee tar 5 jx: tas E ey 
t © ien t-j j=o t-j j=o t-j 
r j (51) 
t cay By, BA Mët bib * 51 
r j=0 t-j t 
Now define the new variables 
T 
te e la sie a TTT ent " "ES 
d 
Zt 1 = ` där = Xp] + 2X4_9 P s as; T Tx._+ 


179 


+ 4x + ccs ETX 


t2 t-1 t-2 


Ss AL l a Thx 


t-1 t-2 t-T 


and rewrite (51) in terms of these variables, so that 


yz zat az + au? (52) 


+ + + + 
oto 1 taz a z u 


t, t 2 >. E Zei E 


This is nothing but a usual multiple regression with y. as the 
dependent variable and the z's as the independent variables. Con- 
sequently the (r+2) parameters in (52) - (r+1) a's and the inter- 
cept a - can be estimated using ordinary least See S Í 
Given these estimates of the a's, estimates of the SR can be 
obtained from (49) and the distribution of weights is determined. 
A plot of B; against j (see Section 8.4) is often considered 
useful. 

Denoting by Z the [(n-T)x(r#1) E matrix of independent 
variables excluding (the column of ones for) the intercept, the 
variance-covariance matrix of estimated coefficients is given oe 


var(a) = $2(z'z)"! , 


22/ See Chapter III. 

23/ n observations minus the observations "lost" by the lag of 
maximum length T of the (r+1) constructed z variables. 
Johnston calls this matrix W (see p. 295). 


24/ Analogous to result (9) in Chapter III. 


180 


The estimated sampling variances of the (T+1) individual Bj are 
linear combinations of these variances (following (50)) and are 


given by25/ 


var(b) = S2k(Z'Z) ]k' - (53) 


The standard errors of the (T+1) individual 8; are given by the 
square roots of the diagonal elements of this matrix. 

In equation (52) there are as many independent variables as 
one plus the imposed order of the polynomial. This is a general 
conclusion. For example, if a cubic is imposed (r=3), there will 
be four independent (z) variables and five parameters to be esti- 
mated (a, ag’ ays as and ay). However, this general conclusion 
is no longer valid when restrictions are also imposed on the lag 
distribution. These usually take the form of a near-point zero 


ver š WAWA 2 
restriction or a far-point zero restriction, or ZE Examples 


are illustrated in the diagrams below. 





Near-Point Zero Restriction Far-Point Zero Restriction 


25/ See Johnston, p. 295. 


26/ They may also take the form of zero slope restrictions, but 
these will not be considered further here. 


181 


Suppose a near-point zero restriction is imposed on the lag 


distribution. This means that 


since By SR from (49). Hence the coefficient on the variable 
Zo in (52) is constrained to zero, so that the variable is elimi- 
nated from the equation. A near-point zero restriction has eliminated 
one variable and therefore one coefficient from (52). 

Similarly, it can be shown that the imposition of a far-point 
zero restriction on the lag distribution will also eliminate one 
variable from (52). For example, if a quadratic is imposed (r=2) 


and T=3, then 


B3 a, + 3a, + 9a. = 0 


or 


Di 
"I 


-3(a, + 3a;) ° 


Consequently, the right-hand side of (52), which, since r=2, has 


three variables, namely 


Bee op t Sei" Bett 2 


182 
can now be written as 
aj (ZE 4-324 o) + az(Z¿ 2-92, o) ° 


This contains only two variables, say 


~ ~ 


Soe 4” Bees 
where 
Zu" 2¢,1752t,0 
and ~ 
Zt 2 7 24 27%, 


~ 


By defining results (52) and (53) in terms of these new z vari- 
ables, the lag coefficients and their associated standard errors 
can now be estimated. 

The following general result therefore holds for polynomial 
distributed lags: given a value of r (order of polynomial) and a 
value of T (maximum lag length), there will be ri slope coeffi- 
cients to estimate in addition to the intercept, with one slope 
coefficient being eliminated for each zero restriction imposed. 

The imposition of a near-point and a far-point zero restriction to- 
gether will therefore eliminate two variables from the basic equation 
(52). These conclusions are valid for each polynomial distributed 
lag introduced into (52). ety Note that other (non-lagged) variables 
may also be introduced into the equation. 


To employ a polynomial distributed lag technique, then, the 


27/ Recall that these conclusions assume that the lag distribution 
commences in period 0. This assumption can be easily relaxed. 


183 
analyst must specify 


(i) the order of the polynomial (r); 
(ii) the maximum lag length (T); and 
(iii) the existence, if any, of zero point 
restrictions (near and/or far). 

Since this is the additional information which is being imposed in 
an effort to help overcome multicollinearity problems, it should 
be determined before the equation (40) is estimated; that is, the 
analyst should consult the relevant theory, the results of previous 
research and institutional arrangements to determine an appropriate 
shape (and length) for the lag distribution, and this should be 
completed before the estimation procedure. Usually r lies between 
1 and 4 and T does not generally exceed 5 years (or 20 quarters) -- 
but it must be emphasized that the choices depend on the problem 
at hand. 

The lag coefficients should be subjected to the same scru- 
tiny as to sign and size as any estimated coefficients. Since 
standard errors are available from (53) they can also be subjected 
to the same t-tests 28/ The statistics for equation (52), including 
R?, etc., are also available to assist in the final assessment of 
the estimated equation. 

In addition, some further information about the lag distri- 


bution is often useful. The sum of the lag coefficients is defined 


28/ See Section 5.3. 


184 


AA T A 
B= CB, (54) 
with the associated variance 

var(8) = S21K(Z'Z) lkK'1' (55) 


where 1 is a [1x(T+1)] row vector of ones. The average lag is 


defined KEN 


T A 
B= 296, (56 ) 


with the associated variance 


a 


var(B) = S%JK(Z"Z) "EA f (57) 


where J is aa row vector of the form 


d-s Kë Bye. hal ts 


Since some of the B; might be negative, care must be taken in the 


interpretation of these two additional results. 


The PDLAG Program 


The program which performs polynomial distributed lag 


29/ This is the same as B in the previous section (see result (48)). 
d is sometimes referred to as the impact multiplier, while 8 
is called the long run multiplier. 


30/ For the geometric distribution this is A/(1-A), See Johnston, 
p. 299. 


185 


analyses is called PDLAG. It is stored in the public libraries in 


the workspace 32 PDLAG. To execute the program, type 


PDLAG 


and respond to the questions accordingly. 

The user is first asked to enter the dependent variable, 
then a matrix of all independent variables which are not to be 
lagged. If there are no independent variables in the equation 
in this category, enter 0. 

The imposed lag schemes must now be defined. In general, 


these are to be entered as follows: 
V LAG VAR 


where VAR is the independent variable on which the lag scheme is 


to be imposed and V is a 5-element vector containing 


(i) order of polynomial (r); 


(ii) beginning of lag period (0 or some 
positive integer); 


(iii) end of lag period (T); 
(iv) near-point zero constraint (0 or 1); 


(v) far-point zero constraint (0 or 1). 


The example chosen to illustrate the PDLAG program is an extension 
of the simple consumption function specification used in Examples 4 to 7 
above. Instead of postulating that real (constant dollar) consumption 
depends on real disposable income in the same period, the specification is 


extended by postulating that consumption depends on permanent income, where 


186 


permanent income is assumed to be generated by a distributed lag on the 
current and past values of real disposable income. The weights are expect- 
ed to eventually decline to zero so an end-point zero constraint is imposed. 
A quadratic (r = 2) lag pattern is chosen for illustrative purposes since 
this can admit either a monotonically declining weight pattern or a pattern 
containing a maximum weight on other than the current level of disposable 
income. Standard statistical tests on subsets of lag weights (see Section 
5.5) suggest that an appropriate end-point constraint is 83 = 0. 

The resulting parameter estimates are presented as Example 9, which 
also illustrates the output for the PDLAG program. Note that the estimated 
lag distribution displays a maximum weight on the current level of disposa- 
ble income and that the estimated sum of the lag coefficients (result (54) 
above) is 0.80 which is very close to the estimated slope coefficient 
obtained in each of the Examples 4 through 7 above. 

However, this result is still unsatisfactory from an autocorrelation 
viewpoint, as indicated by the low Durbin-Watson statistic (see Section 6.4). 
This result suggests that the simple theoretical model should be respecified 
(e.g., to include additional explanatory variables if relevant), or that the 
estimated parameters should be corrected for autocorrelation. Such correction 
can be done practically by choosing an estimated value of p and applying 
the PDLAG program to the appropriately transformed variables. A search over 


different values of p analogous to the procedures outlined in Section 6.4 


would be appropriate. Alternatively, the analyst could construct the 
relevant ZL j variables using the method presented above and then apply 
one of the estimation procedures outlined in Sections 6.4, 7.2 or the 


Appendix to obtain a new vector of estimated parameters. 


187 


Example 9 


PDLAG 
ENTER DEPENDENT VARIABLE 
D: 

CON 


ENTER MATRIX OF INDEPENDENT VARIABLES WHICH ARE NOT TO BE LAGGED (0 IF NONE) 
D: 
0 


ENTER HYPOTHESIZED LAG DISTRIBUTION AND INDEPENDENT VARIABLE 
D: 
203 01 pLAG PDI 
ANOTHER INDEPENDENT VARIABLE TO BE LAGGED? 
NO 
ALIGN PAPER 


ORDINARY LEAST SQUARES 


ESTIMATED STANDARD 
COEFFICIENT ERROR T-VALUE 
CONSTANT 6611.06371 569.45846 11.60939 
Al 0.63661 6.181452 4.50460 
A 2 “0.56479 0.26298 72.14763 
P-SOUARED oense Eh re 0.99423 
ROAR =-SOURRED e «swede Rei ENEE 0.99397 
r-STATToTIC € 2r NS). os cxwes 3707.147204 
STANDARD ERROR OF THE ESTIMATE 762.55963 
COEFFICIENT OF VARIATION...... 1.841763 
DURBIN-WATSON STATISTIC .csccce 0.51934 


DISTRIBUTED LAG INTERPRETATION : LAGGED VARIABLE 1 


LAG COEFFICIENT STD. ERROR T-VALUE 
0 0.63661 0.13999 4.54750 
H 0.18935 0.04853 3.90148 
2 70.02286 0.09519 70.24012 
3 0.00000 


188 


8.6 Other Distributed Lag Patterns 


There have been many other distributed lag patterns proposed 
over the years, but they are not nearly as widely used as the two 
patterns discussed above. Moreover, the polynomial pattern is very 
flexible, and as such can be used to approximate almost any proposed 
pattern. For example, a polynomial of zero order (r=0) specifies 


a uniform distribution 


while a polynomial of the second or third order can often approxi- 


mate a geometric distribution: 


This is useful if the analyst does not wish to impose the same 


distribution on all variables. 31/ DeLeeuw's inverted V can be 


31/ See Section 8.4. 


189 


closely approximated by a quadratic (r=2) with near- and far-point 


zero restrictions: 


while the Solow Pascal distributions can be approximated by a poly- 


nomial of order 3 or 4: 


Consequently, rather than attemptinq to study the vast array 
of proposed lag distributions, it is recommended that the aayat 
concentrate on the contents of the next section and the previous 
two sections, which discuss the most practical and flexible 


methods of estimating distributed lags. 


190 


8.7 Shiller Lags 


As discussed in Section 8.3, the estimation of a causal relation- 
ship such as (40) involving a number of lagged variables as independent 
variables can lead to the problem of multicollinearity. The difficulty 
arises not only because of low degrees of freedom (a result of the 
large number of parameters which must be estimated) but also because 
there is too little variety in the existing observations, i.e., the 
incorporation of further lags on the same independent variable provides 
insufficient additional information to allow proper estimation of the 
resulting parameters. Recall that the remedy for the problem of 
multicollinearity is the incorporation of additional information. In 
the case of a polynomial distributed lag analysis (as discussed in 
Section 8.5) this additional information took the form of an a priori 
imposition of a polynomial distribution on the estimated lag coeffi- 
cients. This section presents an alternative technique for the 
incorporation of additional information into a model involving lagged 
independent variables. 

Shiller 1ags32/ evolved from the fact that, in many applications 
of the linear distributed lag model for estimation, the analyst feels 
that there exists some prior knowledge that the lag coefficients 
should trace a "smooth" or simple curve. The Shiller method of 


32/ See R.J. Shiller, "A Distributed Lag Estimator Derived from 
Smoothness Priors," Econometrica, Vol. 41 (1973), pp. 775-788. 


191 


distributed lag analysis proceeds from the basic objective of smooth- 
ness and develops a method of imposing this prior belief of smoothness 
upon the estimated lag coefficients without having to constrain the 
parameter values themselves. (Note that this is quite different from 
an Almon lag analysis, where an exact polynomial distribution is 
imposed upon the lag coefficients.) 

For the purposes of a Shiller lag analysis, a curve is defined 
to be smooth if the rate at which its slope changes is small. In this 
sense, a straight line is perfectly smooth. A smoothness prior (belief) 
then amounts to a belief that the rate of change of the slope of the 
curve traced by the parameters in a distributed lag should be small in 
absolute value. More formally, recall the distributed lag model 


T 
y. = aF E BX +u " t= 1, 2. cece än 


j=0 J tJ t 
In this context, a smoothness prior can then be stated as a belief that 
the differences between successive parameters in equation (40) take on 
arbitrarily small values. Different degrees of differencing (denoted 
d) can be considered. The two most commonly used are first-order 


differencing (d=0)33/ which employs T differences of the form 


B; = Di 3 = 0, aie es T-1 


33/ A tradition appears to have been established which specifies that 
d equal the degree of differencing minus one. 


192 


and second-order differencing (d=1) which uses (T-1) differences of the 
form 


(8; bg Bal = (B541 Se SI a J = 0, ...3 T-2. 


The smoothness prior then specifies that the chosen differences are all 
to be small in absolute value. In other words, for del, it specifies 
that the second differences of the lag coefficients should vary around 


zero, so that the probability of the absolute value of these differences 


PL|(8; P Bal EN GI - dk? < £ 


for arbitrary 6>0, ef and j=0, ..., T-2. The more strongly the 
smoothness prior is held, the smaller will be the tolerable range 
around 0 (i.e., the smaller is & ). A point worth noting is that 
no prior belief about any one particular parameter estimate is being 
maintained; the only prior belief held concerns the probability 
distribution of the differences of the laq coefficients. 
Implementation of a Shiller laq analysis proceeds as follows: 
the constraint equations can be written in nonstochastic form, e.g., 


for del, 


(B, 


j s, Ban os (B54) SS Bun) = H j = 0, ...3 lezy 


which can be expressed in general matrix notation as 


Rb (58) 


" 
° 


193 


where b is the [(T+1)x1] distributed lag coefficient vector (as in 
(38) above) and Ry isa [(T-d)x(T+1)] matrix of restriction 


coefficients. For example, in the case of d=], 


rr he" *-* 2 ee 

Oth se e 2 8 O 
Ry = 

(bk. ER ZEN 1 se v 


Following the approach outlined in Section 7.4 above (see, 
especially, equation (30)), a stochastic version of (58) can be combined 


with equation (40) and written as 


= b + w (59) 


where X is an [(n-T)xT] matrix of Xj vectors and 


= 
" 


where u is the [(n-T)x1] vector of errors from equation (40) and 

v is an [(T-d)x1] vector of errors permitted around the second 
difference constraint equation (58). Note that T observations are 
'lost' from y and X as a result of the lagging procedure. For 


example, for T=4, R. would be [3x5] and equation (59) would appear 


194 


ys Xp X4 X3 Xo X1 Bo 
yG X6 Xp X4 X3 Xo By 
Bo 
B3 
Yn Ni Xn Xn-1 *n-2 *n-3 “n-4 ‘aes 
reai SAI Ven Beer et tata De et EE By 
0 1 -2 1 0 0 
0 0 1 -2 1 0 
0 0 0 1 -2 1 


where X, is the observation on the variable x in period t and the 
last three rows represent the second difference smoothness constraints 
which are to be imposed (with a random component). 

By appropriately accounting for the intercept, a , in equation 
(40) and any other lagged or unlagged independent variables that might 
be appropriately included in the specification, equation (59) could be 
estimated by ordinary least squares (using (n-T)+(T-d)=(n-d) "“observ- 
ations") under the assumption that w is homoscedastic; that is, 


that both u and v are homoscedastic with the same ei, SS If this 


34/ This formulation imposes the additional constraint that a = O in 
equation (40) and assumes that no other independent variables enter 
the regression. These restrictions can be easily relaxed by expand- 
ing X and b accordingly (see below). 


35/ Note, that for the standard statistical tests to be applied, both 
u and v must be assumed to be normally distributed with the 
usual] assumptions. 


195 


assumption did not hold then equation (59) would have been estimated by 
generalized least squares (see Section 7.4). Note, however, that this 
would require that the error structure of v be specified a priori, 
which would not be an easy task. For this reason an extension of the 
technique was developed in the original paper. 

The Shiller lag technique as discussed to this point also limits 
the analyst with respect to control of the weight which is to be applied 
to the additional constraints imposed by the prior beliefs being 
considered. Some mechanism by which this weight could change with the 
firmness of the convictions of the analyst would be convenient. Variety 
of the strength of the prior beliefs imposed is accomplished by modifying 


the smoothing constraints (58) to 


kR b = 0 k > 0 (60) 


where k is a constant or scalar (called the tightness parameter). The 
choice of the value of k is important since it can permit equation (59) 
to be estimated by ordinary least squares. To see this, incorporate (60) 


into (59) as 


= b + Ne (61) 


and note that an appropriate choice of k can ensure that the variance 


of u and the variance of kv are identical, hence satisfying the 


196 


conditions for ordinary least squares estimation (assuming they are 
both homoscedastic). 
It has been noted by Shiller (see footnote 33 above) that 


imposing this condition requires that 


where o is the standard deviation of u (see (40) above) and oy is 
the standard deviation of v ; i.e., of the prior probability distribu- 
tion of the smoothing differences being imposed on the d parameters. 
Since neither c nor Od will be known in practice, approximate values 
for both must be derived. An obvious approximation for o is the S 
produced by an ordinary least squares analysis of (40) (see result (10)). 
The choice of oJ is more difficult; for del Shiller suggests an 


initial "guesstimate" of 


8 
R j 
Od "w. 


or) 


H" MAH 


(T + 1)° 


where the Ë; are extraneous or prior estimates of the lag parameters. 
Note that all that is required to obtain o, is an estimate of the sum 
of the lag coefficents. Such an estimate can be obtained by estimating 


equation (40) by ordinary least squares (see result (42) above). 


197 


In practice, the analyst may wish to experiment with various 
values of k (i.e., different values for Sy , since G is usually 
a given constant). It should be clear from (61) that the greater the 
value of k , the greater the weight accorded to the constraint 
conditions by the ordinary least squares estimation procedure, since 
the (T-d) constraint "observations" are being rescaled by k without 
a corresponding rescaling of the elements (or actual observations) in 
y and X. Consequently, the resulting estimated distributed lag 
coefficient vector (b) will be different for different values of k , 
with k=1 (equation (59)) being a special case of this set. Note 
that a k=0 implies that there are no effective constraints so that 
the resulting Š will be the same as obtained from an ordinary least 
squares analysis of (40). If k is very large, the estimated 
coefficients and their standard errors, as well as such equation 
statistics as Rê and the Durbin-Watson statistic, will approximate 
those of a polynomial distributed lag analysis of the same degree 
(that is, with d in (61) equal to r-1 in (49) ) 26/ This state- 
ment also applies to the application of end-point constraints (see 
below). If k is too large, singularity problems may arise. 

Some idea of the effect the selected values of d and k can 
have on the shape of the "smoothed" lag distribution can be obtained 


by considering the following: if d=] and k is large, then the 


36/ However, recall that the degree of smoothness priors is usually 
chosen to be lower than is customary with a polynomial distributed 
lag analysis; values for d of 0O or 1 are most common. 


198 


priors attach high probability to all shapes in which the slope changes 
slowly (maximum probability to a straight line) and low probability to 
jagged shapes. A jagged shape which lies within a narrow band around 
a straight line is still accorded low probability, while a shape which 
deviates markedly from a straight line but which does so gradually is 
accorded high probability. If d=0, the priors assert that the lag is 
unlikely to involve large jumps, and will probably proceed in smal] 
steps. 

The constraints discussed to this point are uninformative as 
to the magnitude of particular lag coefficients. The analyst may, 
however, wish to impose end-point priors (similar to the near- and 
far-end zero restrictions discussed in Section 8.5) as additional 
constraints, implying that the first and/or last coefficients are 
themselves very small. This is accomplished by incorporating 
additional rows into the Ry matrix. For example, a near-end zero 
restriction (or "head constraint") requiring that Bo = 037/ can be 


incorporated by including the row 


0 = k[1 0 ... Ó 0Jb , 


37/ Another form of the head constraint could be @0_, = 0. This can 
be imposed from the smoothing constraint 


(B<: = Bo) = {Bo = B.) = 0 

which can be written as 
Bay = 225 = Br « 

Imposing ßB-ı = 0 would therefore require the addition of a row 
O=kf2=1 0 sse H Sib 


into the Ra matrix. 


199 


while a far-end zero restriction (or "tail constraint") requiring that 


By = 0 can be incorporated by including the row 
e bp cse OTD 


Smooth transition to these restrictions in the case of dl is assumed 


by adding the additional rome 


° 
" 


KI -2 «s+ © Ob 
and 
0 = klo O ... =2 TID 


respectively to Ry in addition to the above end-point zero restrictions. 
The imposition of such smoothness priors as these serves to connect the 


first and final coefficients in the lag with the zero "coefficients" 


beyond. 22/ 


The SHILLER Program 


The SHARP APL program which performs Shiller lag analysis is 


called SHILLER. It is stored in the public libraries in the workspace 


38/ These can be derived by using j=-1 and j=T-1 respectively in 
the relevant smoothing equations and requiring that @-, and 
BT+1 respectively equal zero. 


39/ See footnote 38. Note also that it is easy to modify the above 
analysis to permit Go (or any subset of lag coefficients at 
either end of the lag distribution) to "float"; that is, be 
estimated without being subject to the specified smoothness priors. 


200 


32 PDLAG along with the polynomial distributed lag programs. To 


execute the program, type 
SHILLER 


and respond to the questions accordingly. 

The user is first asked to enter the dependent variable, then 
a matrix of the independent variables upon which no lag scheme is to 
be imposed, each column of the matrix containing a variable. Vector 
input is sufficient if only one such variable is to be included in 
the regression specification. If no such unlagged variables are to 
be included, a response of 0 is expected. The user is then asked 
whether or not an intercept is to be included in the model; a 
response of YES or WO is expected. 

The imposed lag schemes must now be defined. In general, these 


are to be entered as follows: 


V SLAG VAR 


where VAR is the independent variable upon which the lag scheme is to 


be imposed and V is a 5-element vector specifying 
(i) the degree of smoothness priors (d), 
(ii) the maximum number of lags (T), 


(iii) an indication as to whether the smoothness priors are to 
include a near-end zero restriction (or head constraint); 


1 for yes, O for no, 


201 


(iv) an indication as to whether the smoothness priors are to 
include a zero far-end restriction (or tail constraint); 


1 for yes, 0 for no, 


(v) the tightness parameter (k). 


Various checks are performed as to the validity of the elements of V; 
any inadmissibilities are rejected and an appropriate error message 
printed. Any number of lag schemes may be imposed on any number of 
variables. Note that affirmative responses to (iii) and (iv) above 
also automatically invoke the smooth transition discussed above. 

The extended consumption example used in Example 9 is used to 
illustrate the application of the SHILLER program. Again a maximum 
of three lag coefficients is permitted and a zero far-end restriction 
is imposed. Second order smoothness priors (d = 1) are specified and 
the value used for the tightness parameter was calculated according 
to the initial "“guesstimate" suggested above. Since this is a 
relatively large value for k, results similar to the PDLAG results 
(with r = 2) should be obtained. The results are presented as 
Example 10. The lag distribution is quite similar to Example 9, 
although it should be noted that the Shiller procedure does not 
impose an exact zero restriction on the final lag parameter. The 
equation statistics are also quite similar, including the low 
Durbin-Watson statistic (see Section 8.5 for further discussion 


on this point). 


202 


Example 10 


SHILLER 
ENTER DEPENDENT VARIABLE 
D: 

CON 


Els 
0 
INCLUDE CONSTANT TERM (YES OR NO)? 
YES 
ENTER HYPOTHESIZED LAG SCHEME AND INDEPENDENT VARIABLE 
Il: 
13 01 1964.8 SLAG PDI 
ANOTHER INDEPENDENT VARIABLE TO BE LAGGED? 
NO 


PRELIMINARY OLS ANALYSIS: 
R-SQUARED.... 0.99404 
DURBIN-WATSON 0.52427 


SHILLER LAG ANALYSIS -- PRIOR STRENGTH(S) = 1964.800 
COEFFICIENT STD. ERROR T-VALUE 
CONSTANT 6741.8442 590.9572 11.4083 
LAG: 0 0.6036 0.1250 4.8279 
1 0.2535 0.1080 2.3469 
2 0.0180 0.1041 0.1731 
3 “0.0754 0.1018 “0.7405 
R= SQUALL EE 0.99833 
ABAR=SQOCARLIND sunos uone uses a wwa 0.99818 
FoSTATISTIC ( By ER e geg gege 6565.16281 
STANDARD ERROR OF THE ESTIMATE 752.18054 
COEFFICIENT OF VARIATION. e e 1.51369 


DURBIN-WATSON STATISTIC....... 0.51032 


203 


CHAPTER IX 


INSTRUMENTAL VARIABLES 


9.1 Instrumental Variables (IV) 


In theory, the existence of a correlation between the error 
(disturbance) term and any of the explanatory variables in a 
regression is an extremely serious problem, since the ordinary 
least squares (OLS) estimators will be inconsistent. This means 
that the OLS procedure produces biased estimates and that the bias 
does not disappear as more information is collected (i.e., as the 
sample size is increased) .1/ 


The matrix representation of the qeneral model which is to 


be estimated was presented in Chapter III as 
y = X tus (6) 


In theory, if the matrix X of explanatory variables is fixed -- 


that is, if the x variables are not subject to or correlated with 


1/ More formally, a sufficient condition for a consistent estimator 
is that both the bias and the variance should tend to zero as 
the sample size increases (Johnston, p. 271); thus, an asymp- 
totically unbiased estimator whose variance tends to zero as 
the sample size increases is a consistent estimator (p. 272). 


204 


any error -- then OLS estimation presents no difficulties with res- 
pect to bias. However, if for some theoretical reason any of the 

x variables are either stochastic or are correlated with the dis- 
turbance term u , then problems arise. Moreover, they are 
exceptionally serious since they result not only in small sample 
bias but also in inconsistent parameter estimates. The case of 
stochastic regressors has been discussed (under errors of measure- 
ment in the independent variables)</, so it is the latter problem 
which will be discussed in the following sections. 

In this section the outline of an alternative estimation 
procedure which is appropriate under these conditions is presented. 
It is called the method of instrumental variables (IV). 

The problem arises and is a serious one because (theoreti- 
cally) some of the variables in the X matrix are correlated with 
u . A solution to the problem, then, is the elimination of this 
correlation. The method of instrumental variables replaces each 
of the "problem variables" in the X matrix by a new variable which 
ideally should be strongly correlated with the eliminated x and at 
the same time uncorrelated with u . These variables are called 


instruments. If the new matrix of instruments is denoted Z 3/ then 


2/ See Section 6.6 and footnote 19, Chapter VIII. 


3/ This bears no relationship to the matrix Z in Section 8.5. 


205 


the instrumental variable estimates of 8 (denoted B ) are defined 
4/ 


as 
B = (z'x)7!z"y (62) 
with 
var (B) = s2(z'x)7!(z'z)(x'z)7! (63) 
where K. N 
„2 = (y=XB)! (y-X8) - (64) 
(n-k) 


Since there is no correlation between Z and u , the estimators 
given in (62) above are consistent estimators. 

In practice, the finding of variables to act as instruments 
is a very difficult task since u is unobservable! In addition, 
if the instruments are to be highly correlated with x variables, 
which in turn are correlated with u , then it is quite likely 
that the instruments will be correlated with u . Yet, if the 
instruments are not fairly highly correlated with the “problem 


x's", then the sampling variances of the instrumental estimators 
will be unnecessarily large.2/ Finally, since the instruments may 
be measured in units totally different from those of the x variables, 


their coefficients are often impossible to interpret. However, in 


4/ See Johnston, pp. 279-280. 
5/ See Johnston, p. 281. 


206 


some circumstances the context of the problem suggests the selection 
of appropriate instruments BI It is in these cases that the instru- 
mental variables technique is most useful. The sections that follow 
present a number of such applications. An appropriate estimating 


program is outlined in Section 9.5. 


9.2 Instrumental Variables and Lagged Dependent Variables 


In Section 8.4 an equation was developed which led to the 
introduction of a lagged dependent variable as an independent vari- 
able. This raises the problem of stochastic regressors, since 
Ar. depends on the stochastic error term ui . As a conse- 
quence, if the error term is not autocorrelated, then OLS estimates 
are biased in small samples. If the error term is autocorrelated, 
then OLS estimates are inconsistent (a more serious problem). 

Since the equation also includes an independent variable Ku which 

is uncorrelated with the error term (by definition), Ku or some 
distributed lag on the x variable could be used to develop an 
instrumental variable (IV) for ër e ” In the latter case the 
resulting predicted value Ve could be used. If the errors are 

also autocorrelated, then this method can be combined with a genera- 
lized least squares estimation procedure to produce efficient parameter 


7/ 


estimates .— 


6/ Johnston discusses one such application on pp. 283-286. 


7/ See Sections 6.4 and 7.2 or Johnston, p. 319. 


207 


9.3 Indirect Least Squares (ILS) 


In this and following sections, the discussion of the problems 
associated with a sinqle equation is extended to consider a simulta- 
neous system of equations in which the equation under review might 
be included. Such a system may include only one additional equation 
or many such equations, the latter being the basis of large scale 
econometric mode1s.8/ The discussion still concentrates on the 
estimation of the parameters of the sinqle equation, but in the con- 
text of an entire simultaneous system. 

Since each equation determines the values of a dependent 
(often called endogenous) variable, there are as many of these 


variables in the system as there are equations (providing that 


the equations are independent). Consider two such variables, deno- 


ted y, and y2 . The appropriate equations might look like: 
yi = XB. Tops + Uy 
(65) 
Yo = X2ß2 + Y2yı + Us 


where X, and X, are matrices of independent (often called exo- 


genous) variables, and 8, and 8 are the associated parameter 


vectors Zi These are called the structural form equations. The 


8/ Such as the CANDIDE, RDX2, and TRACE models of the Canadian 
economy. Estimation of a non-simultaneous system of equations 
was discussed in Section 7.6. 


HI Xi and X2 may contain some of the same independent variables, 
including a column of ones for the intercept. They may also 
include lagged dependent variables, but in this case see Section 
9.2. 


208 


basic estimation problem is as follows: y, depends on y> , but 
y2 is stochastic since it depends on uz . Consequently, ordinary 
least squares estimates of the coefficients in the first equation 


will be subject to small sample bias. A similar situation holds for 


the second equation, unless y, zs D. (This latter case is a neces- 
sary condition for a recursive model -- see Section 9,410) If 


yo #0, y2 is also not independent of u, , since it depends on 
yi , which in turn depends on u, . Thus, in this case, OLS esti- 
mates are also inconsistent. This is called the problem of simulta- 
neous estimation. 

One possible solution is to solve the system of simultaneous 
equations given by (65) to obtain y variables on the left-hand 


side and only X variables on the right-hand side; viz. 


f1(X1, Xo, Ur, U2) 


yı 
(66) 
f2 (Xi, X2, Ur, U2) 


y2 


where f, and fz represent linear functional specifications. 
These are called the reduced form equations. OLS could then be 
applied to each of these equations in turn to obtain efficient 
parameter estimates, since X, and X are uncorrelated with 

u) and uz (by definition). However, the parameters estimated 
will be non-linear combinations of the original or structural para- 


meters (Bı, B2, Yi, Y2). The crucial question then is whether 


10/ Also see Johnston, Section 13-1 for a discussion of recursive 
models. 


209 


estimates of the original parameters can be retrieved from the esti- 
mated reduced form parameters. This is the problem of identification, 
which is covered in detail elsewhere.-11/ In short, if the parameters 
of the equation under review are identified12/ it is possible to 
obtain estimates of the structural parameters; if they are under- 
identified it is not. However, if they are over-identified, more 
than one estimate will be obtained, i.e. the estimates are not 
unique. Consequently, this method of applying ordinary least squares 
to the reduced form equations (66), which contain all the x (or 
exogenous) variables in the system on the right-hand side of each 
equation, and subsequently solving for unique values of the struc- 
tural parameters_3/ is only possible when all the parameters are 


just-identified. This estimation method is called indirect least 


squares. 


11/ See Johnston, Sections 12-2 through 12-4. 


12/ This term is used to cover the cases of both just- and over- 
identified. 


13/ Although derived from unbiased estimates of the reduced form 
parameters, these estimates will not themselves be unbiased. 
They will, however, be consistent (Johnston, pp. 345-346). 


210 


9.4 Recursive Models 


A fairly common specification of (65) above involves, for 
example, y> = 0 . In this case y depends only on X2 and 
uz . Since X is uncorrelated with u, (by assumption), OLS 
can be applied to obtain estimates of the parameters in this struc- 
tural equation. The only remaining problem in the first equation 
is that y, depends on u2 , i.e. it is stochastic. However, by 


definition (from result (4)), 


y = y2 + €2 
where Yo is the predicted value of the equation already estimated 
and is independent of the errors u>. As a consequence, Ye can 
be used as an instrument for yz in the first equation, which can 
then be estimated by OLS. It is only in the case of recursive 
simultaneous models that OLS provides a satisfactory estimation 
technique, and even then it must be applied to each equation in a 
step-wise fashion using previously obtained predicted values of the 
endogenous variables as instruments. This assumes that the true 


error terms across the equations are not correlated. 14/ 


14/ Equation (13-2), Johnston, p. 377. The conditions for OLS esti- 
mation equation by equation for a system of non-simultaneous 
equations was outlined in Section 7.6. 


211 


9.5 Two-Stage Least Squares (2SLS) 


In the case where y2 # 0 , OLS estimates will be biased and 
inconsistent, and alternative estimation procedures must be adopted. 
When the parameters are just-identified, indirect least squares 
can be applied, but when they are over-identified two-stage least 
squares (2SLS) is the most common estimation technique. 

Again it is necessary to purge the right-hand endogenous 
variable of its stochastic component in such a way that the instru- 
mental variable replacement no longer depends on the error term in 
that equation. Consider the first equation given in (66) -- there 
are two difficulties: (i) y2 depends on us via the second 
equation and is therefore stochastic, so that OLS estimators are 
subject to small sample bias, and (ii) y depends on u, since 
it depends on y, in the second equation, and y, in turn depends 
on u, in the first equation. As a result, OLS estimators are both 
biased and inconsistent. However, if y, and y2 are regressed 
on all the exogenous variables in the system using OLS -- that is, 
if the reduced form equations (66) with no zero parameter restric- 
tions are estimated using OLS -- then the instruments y: and Yo 
(see above) can be obtained. These instruments do not depend on 
either u, or us, Hence, if these are used as instruments for 
yı and y, when the structural equations (65) are estimated by 
OLS in a second stage, then both the problem of small sample bias 
and the problem of inconsistency disappear. This is the method 


of two-stage least squares. Since this is a particular application 


ele 


of instrumental variables, the formulae (62) through (64) can be 
used to obtain the relevant parameter estimates; all other equa- 
tion statistics (R*, etc.) can be calculated based on e= y-XB 
where B is the IV parameter estimator given by result (62). 

It should be noted that if the first stage equations fit 
well (high Ri, etc.), there will probably be little difference 
between the 2SLS estimates and the OLS estimates of the structural 


parameters. 


The STAGE? Program 


The STAGE2 program, located in the public libraries in work- 
Space 32 REGRESSION, uses instrumental variables as a method of obtain- 
ing consistent parameter estimates (i.e., results (62) to (64) JI 


The program is executed by typing 
Y STAGE2 X 


where Y is a vector of length n representing the dependent variable, 
and X is a matrix of n rows and 2k columns. The first k columns 
represent k independent variables, and the last k columns the 


instruments to be applied to each of these variables. The choice 


15/ Consequently the user is cautioned against interpreting the 
results as two-stage least squares estimates. For these 
estimators see Johnston, Section 13-2. Their implementation 
is discussed in the following Section 9.6. 


zis 


16 / 


of appropriate instruments must be specified by the user.— 

As an example, suppose a data matrix M contains 35 observa- 
tions on each of 20 variables. Three ordinary least squares equations 
are estimated, and following the estimation of each the calculated 
y values (y) resulting from the regression are included by the user 
in the matrix M as columns 21, 22 and 23. Now suppose that the 
analysis required is an instrumental variables analysis using vari- 
able 1 (column 1 of M) as the dependent variable, and variables 
12, 6 and 15 as independent variables. Further, suppose that variable 
23 is to be the instrument applied to variable 12, variable 21 the 


instrument to be applied to variable 6, and variable 15 is to be its 


own instrument. The results of this analysis can be obtained by 


typing 
ML3:1] STAGE2 M[:32 615 23 21 15] 


Full regression output similar to that provided by the REGR pro- 
gram is printed.17/ Optional output features and the inclusion of 
an intercept term are again controlled by the state setting functions 


in 32 REGRESSION (see section 3.1). 


16/ Useful information is contained in a paper by J.D. Sargan, 
"The Estimation of Economic Relationships Using Instrumental 
Variables," Econometrica, Vol. 26 (1958), pp. 393-415. 


17/ See Example 1, Section 3.3. 


214 


9.6 Three-Stage Least Squares (3SLS) 


In the preceding discussion (centered on equation (65)) it was 
assumed that u, and Us were independent error terms. The relaxation 
of this assumption of error independence between the equations in a 
system of equations was introduced in Section 7.6 above where a 
non-simultaneous system (or group) of equations was considered. The 
same ideas apply to a simultaneous system of equations; namely, that 
greater efficiency in estimation can be achieved by making use of this 
information. The resulting technique for a simultaneous system of 
equations is called three-stage least squares (3SLS). 

Essentially this technique involves the application of gener- 
alized least squares (see Section 7.1) to a system of equations, each 
of which has first been estimated using two-stage least squares (see 
Section 9.5). Stage 1, therefore, requires the estimation of the 
reduced form of the model (i.e., equation (66)) and the extraction of 
the fitted values of the endogenous variables in the model. Stage 2 
requires the use of these fitted values to obtain 2SLS parameter 
estimates. In the third stage, these parameter estimates are used 
to generate estimated residual vectors for each equation so that an 
estimate of the (cross-equation) error variance-covariance matrix 
can be obtained (denoted £ in Section 7.6) and generalized least 
squares is then applied to the system to obtain the most efficient 


parameter estimates. The resulting formulae are quite complicated. 18/ 


18/ See Johnston, pp. 396-398. 


215 


It should be noted that all identities and under-identified equations 
should be removed from the system before applying 3SLS. Where all 
structural errors are uncorrelated across equations, the 3SLS estimates 
will be the same as the 2SLS estimates. It may also be noted that the 
residuals from the estimated 3SLS equations could be used to obtain 

new estimates of the (cross-equation) variance-covariance matrix and 
GLS reapplied, thus yielding new estimates of the structural coefficients. 
This procedure could be repeated many times, resulting in a procedure 


which has been called iterative 3SLS. 


The STAGE3 Program 


The implementation of 3SLS is facilitated by the use of the 
STAGE3 program located in the workspace 32 REGRESSION. To illustrate 
the procedure, a six equation system based on a model constructed by 
L.R. Klein - often referred to as Klein's Model 119/ - is presented 
below. The resulting program output is shown as Example 11. 

The model consists of three behavioural equations - a consumption 
function, an investment function and a demand-for-labour function - and 


three identities - defining total production, total profit and the total 


wage bill. In general, the model can be expressed as: 


19/ A brief discussion of this model is presented in Section 9.2 of 
H. Theil, Principles of Econometrics, pp. 432-438 where further 
references can J. be obtained. In the example presented below 
the capital stock identity is replaced by a wage bill identity 
(as is done in H. Theil, Principles of Econometrics, p. 455). 


ya 
y2 
ys 


where f,, 


Yu 
ys 


Ne 


216 


DI, Ys, Xss. Xe) 
fo (yx, X55 X69 Xe) 


fs (Yeyu. Mra Xe) 


and f, represent linear functions and 


y e ws Yara Ko 
ys + X 


yy F Yo + Xs x 


The definition of each of these variables is as follows: 


Endogenous Variables (y's): 


yi 
E 
ys 
Yu 
ys 
ys 


Aqgregate consumption 

Net investment 

Wage bill paid by private industry 
Total profit 

Total wage bill 

Total production by private industry 


Predetermined Variables (x's): 


Xi 
X2 
X3 
Xy 
Xs 
X6 
X7 
Xe 


Wage bill paid by government 

Business taxes 

Government non-wage expenditure 

Time 

Profit in the prevous year 
Beginning-of-year capital stock 

Private production in the previous year 
Intercept 


217 


The model is estimated on annual data for the United States for 
the period 1921-41, 29/ Note that it is dynamic in the sense that it 
contains lagged endogenous variables. The name "predetermined" is 
assigned to a matrix of variables which includes lagged endogenous and 
exogenous variables. The resulting program output for this system of 
equations is presented as Example 11. Note that the non-default state 
settings / COVARIANCE and NOCONSTANT were in effect for this example 
output and that both the 2SLS and 3SLS results are presented. The 
former non-default setting requires the display of the variance- 
covariance matrix of the estimated parameters for each estimation stage 
(i.e., 2SLS and 3SLS), while the latter non-default setting can permit 
behavioural equations which do not contain an intercept to be included 
in the system. In the current example, however, Xg appears in all 
three behavioural equations and so the non-default setting is not 
strictly necessary. Note that all identities have been removed from the 
system and the interested reader can check that all of the behavioural 
equations are identified. The program output includes the parameter 
estimates, standard errors and t-values ,22/ together with the equation 
statistics for each estimation stage. Therefore the STAGE3 program can 


be used to implement both 2SLS and 3SLS. 


20/ Data for 1920 is used for the lagged endogenous variables. The 
data is presented in H. Theil, Principles of Econometrics, p. 455. 


21/ See Section 3.1 for a discussion of default settings. 


22/ These results are presented in H. Theil, op. cit., p. 517. See also 
A. Zellner and H. Theil, "Three-Stage Least Squares: Simultaneous 
agin of Simultaneous Equations," Econometrica, Vol. 30 (1962), 
pp. 54-78. 





Y STAGES X 
TWO-STAGF, THREE-STAGE OR BOT! (TYPE 2, 


218 


Example 11 


3 OR ROTH)? 


NUMBER OF STRUCTURAL (NOM-DEFINITION) FOUATIONS 3 


COLUMN NUMRERS OF DEPENDENT VARIARIES FOR THF 3 FQUATIONS 1 2 3 


FNTER COLUMN MUMRERS OF TYR FOLLOWING: 


JOINTLY DEPENDENT VARIARIF(S) ON R.UT,S. 


INDEPENDENT VARTARLE(S) ON R.",.S. OF FO. 


JOINTLY DEPFUDENT VARIARIF(S) ON R.A.S. 


INDEPENDENT VARIAPLE(S) OM RH.S, OF FO, 


JOINTLY DEPFUPFYT VARTARILE(S) ON R.S8.S. 
INDSYPENDENT VARTARIE(S) ON R.",S. OF FO. 


OF FO, 1: W 
Pe S 


OP FO, 2: 4 
a2: 8 6 A 


OP FO, 3: & 
3: 7 4 8 


2SLS_PARAMETSR FSTIMATION 


EQUATION VARTARLE COFFFICIENT 
1 you 0.0173 
yos 0.8102 
x05 0.2162 
XOR 16.5548 
2 you 0.1502 
x05 0.6159 
X06 “0.1578 
on 20.2782 
3 Top 0.4389 
X07 0.1467 
Tou 0.1304 
Zog 1.5003 


STD. FRPOR 


90,1189 
0.0402 
0.1073 
1.3708 


0.1732 
0.1628 
0.0381 
7.5427 


9.0356 
0.0388 
9.0291 
1.1478 


VARTANCE-COVARTAUCE MATRIX OF 2SLS COEFFICIENTS 


0.0139 “0.9015 “0.0096 
“0.0015 9.0016 “0.0005 
0.0096 0.0005 0.0115 
“0.0153 “0.0327 “0.0048 

STATISTICS FOR FQUATION 1 


“0.0153 


“0.0327 
“0.0048 


1.7445 


MULTIPL® CORRELATION COFFPFICIENT (Re2) 


CORRECTED Ra? (3*2) 

OVERALI, F-STATISTIC (3,1 
STD. ERPOR OF THE FSTIHA 
CORFFICIENT OF VARIATION 
DURBIT-WATSON STATISTIC 


STATISTICS FOR FOUATION 


2) 
Ts 


2 


MULTIPLE CORRELATION COEFFICIENT (Ra?) 


CORRECTED Bei (Re2) 

OVERALL FP-STATISTIC (3,1 
STN, ERROR OF PE ESTIMA 
COFPFICIFHT OF VARIATION 
DURBIH-WATSON STATISTIC 


7) 
T? 


0.0300° 

“0.0258 
0.0042 

RED 


0.976711 
0.972601 
237.6uqson 
1.135659 
2.103257 
1.485072 


M. RAUAAU 
9.864569 
43.559005 
2.307149 
103.195981 
2.0A5334 


ROTR 


5 


Tp-VALUPS 
9.1468 
20.1289 
2.0158 
12.5340 


0.8672 
3.7838 
“4.3677 
2.6885 


12.3165 
3.7767 
4,4746 
1.3071 


“0.0258 
0.0265 
“9.0939 
0.7776 


9.0042 


“0.9039 


0.0013 


“0.2690 


70.9248 

0.7776 
70.2690 
56.8924 


0.0013 


70.0012 
0.0003 
“0.0067 


“0.0012 


0.0015 
97,0001 


“0.0154 


Continued. 


“0.0003 


0.0091 
0.0008 


0.0156 


“0.0067 
“0.0154 


0.0156 
1.3174 


STATISTICS 


FOR FOUATION 3 


MULTIPLE CORPTLATION COEFFICIENT (Rei) 


CORRFECTFN Re? 


(Re2) 


OVERALT. FeSTATISTIC (3,17) 
STD, FRROR OF THP FSTIMATE 
COPFPICIUNT OF VARIATION 
DURBIN-VATAON STATISTIC 


POVATION 
1 


219 


Example 11 continued. 


9.9P7414 
0.985193 
Hub, 55A573 
0.767155 
2.109778 
1.963816 


381252 PARAMETER F2TIMATIORN 


VAPTANCFE-COVARTANCE 


9.0131 
“0.9014 
“0.9090 
“9.0164 

0.0039 
“0.9030 
70,0004 
0.0615 

0.0012 

0.0013 

0.0004 

0.0001 


70.0014 


0.0015 


“0.0005 
70,0304 


0.0013 


"00012 


0.0004 


“9.0819 


0.0000 


70.0001 


0.0003 
0.0011 


“0.0090 
“0.0005 
0.0109 
“0.0063 
70.0045 
0.0051 
“0.0001 
0.0150 
0.0008 
“0.0012 
0.0003 
0.0228 


STATISTICS FOR ROUATION 1 


VARTARLE. COEFFICIENT S 
vou 0.0479 
vos n.,ñn1irq 
ron 0.1897 
XOR 16.1926 
you 0.2110 
ns 0.5669 
Zon “0.1472 
on 47, 9912 
You 0.4282 
X07 0.1544 
Zon 0.1357 
X08 1.6936 


“0.0164 
“9.0306 
“9.0063 


1.6907 


"pn. pupp 


0.0161 


“0.0070 


2.1334 
0.0063 
0.0007 
0.0012 


“0.4322 


MULTIPLE CORRELATION COEFFICIENT (Rei) 
CORRECTED Re2 (Re2) 
OVERALL F-STATISTIC (3,17) 


Son, FRROR OF THR 


ESTIMATE 


COFFFICIFN® OF VARIATION 
DURBIN-WATSON STATISTIC 


STATISTICS FOR EQUATION 2 
MULTIPLE CORRELATION COEFFICIENT (Re?) 
CORRECTED Ra? (Re2) 
OVFRALL F-STATISTIC (3,17) 
STD, ERROR OF THRE ESTIMATE 
COPFFICIFNT OF VARIATION 

DURPIN-WATSON STATISTIC 


STATISTICS FOR FOVATION 3 
MULTIPLE CORRELATION COEFFICIENT (R*2) 
CORRFCTFND R*?2 (Re2) 
CVERALL F-STATISTIC (3,17) 
ETP. ERROR OF THE FSTIMATE 
COPFFICIFNT OF VARIATION 

DURRIN-WATSON STATISTIC 


TN., FPPOR 
0.1146 
n.03P6, 
N. INLA 
1.3003 


9.1699 
O.ISRA 
A. 03u47 
7,74—A9 


9.0347 
0.0377 
0.0287 
1.1415 


MATRIX OF 3818S CORFFICIFNTS 


0.9039 
0.0013 
"pn. pp 
“0.0u58 
0.0285 
0.0244 
0.0038 
“0.8532 
“0.9007 
0.0007 
“0.0904 
0.0022 


0.978170 
0.974317 
253.910119 
1.999533 
2.036315 
1.492690 


0.900398 
0.882821 
51.226328 
1.215882 
95.990711 
2.093288 


0.987314 
0.985076 
441.029277 
0.770180 
2.118096 
2.038255 


T-VALUF 
9,4u1R? 
21.1584 
1.8133 
172.4534 


1.2497 
3.5691 
BALEN] 
2,4730 


12.3415 
4.0919 
4,7320 
1.4837 


“0.9930 
“9.9012 
9.9951 
9.0161 
0.0244 
9.0252 
“9.9036 
0.7115 
9.0004 
“9.0007 
9.0006 
0.0154 


“9.9004 


9.0004 


“0.0091 
=A. ONRAN 


9.9038 


“0.9036 


9.9012 


“0.2479 


0.0000 


70.9001 
0.0002 


0.0004 


0.9615 
“0.0819 
0.9150 
201338 
9.953? 
0.7115 
“0.2479 
52.5332 
70,0046 
9.9113 
0.0277 
“0.3850 


“0.9912 


9.0909 
0.000R 
9.0963 


“90,0007 


9.9004 
0.0000 


70.0046 


0.0012 


70.9011 
9.0003 
70.0070 


0.0013 


“0.0001 
70.0012 


0.0007 
9.9007 


“0.0007 
“0.0001 


9.0113 


“0.0911 


0.9014 
0.0000 


“0.0149 


90,0004 


70.0003 


0.0003 
0.0012 


“07,0004 


0.0006 


“0.0002 


0.0277 
0.0003 
0.0000 
0.0008 
0.0153 


0.9001 
0.9011 
9.9228 


“0.4322 


0.9022 
0.0154 
0.9004 


“0.3850 
0.0070 
“0.9149 


0.9153 
1.3031 


220 


9.7 Structurally Ordered Instrumental Variables (SOIV) 


One additional problem may arise in the implementation of 2SLS 
or 3SLS for large systems of equations. In the first stage, OLS is 
applied to equations of the reduced form type, where each endogenous 
variable is regressed on all of the x variables (exogenous and lagged 
endogenous) in the system. If there are a number of equations, the 
total number of these variables could be quite large and could 
possibly exceed the number of observations available. This is called 
the problem of undersized samples. If this is the case, the first 
stage equations needed to obtain the instruments cannot be estimated. 
As a consequence, some method must be employed which reduces the 
number of x variables to be used in the first stage estimation 
procedure. Two methods will be discussed here, one in this section 
and one in See following section, 

Fisher has proposed the method of structurally ordered 
instrumental variables (sorv) .23/ Basically, this method suggests 
a specific ordering of the x variables according to how "far 
removed" in the system they are from influencing the dependent 
variable in the equation under review. Variables actually included 
in the equation are labelled zero order variables, variables which 


influence the y variables on the right-hand side of the equation 


23/ F.M. Fisher, "Dynamic Structure and Estimation in Economy-wide 
Econometric Models", in J.S. Duesenberry, G. Fromm, L.R. Klein, 


and E. Kuh (editors), The Brookings Quarterly Econometric Model 
of the United States, 1965, chapter 15. 


221 


in question are labelled first order variables, those which influence 
the y variables on the right-hand side of the equations previously 
referred to are labelled second order, and so on. The analyst, then, 
simply successively includes variables, starting with the lowest 
order, until degrees of freedom are no longer available in the first 
stage regressions. Naturally, since the number of variables in 

each category increases rapidly, it is unlikely that variables 

beyond the second order will be chosen for the first stage regression. 
As long as the necessary degrees of freedom are available, a new 
group of variables should be included. With the reduced form equa- 
tions estimated, structural estimates are then obtained following 


the procedures outlined in Sections 9.5 and 9.6. 


9.8 Principal Components 


The second method for dealing with the problem of undersized 
samples is called principal components. Since this procedure 
involves mathematical computations which may be unfamiliar, the 
analyst is referred to a text for a detailed summary of the formal 
techniques SC In essence, this method decomposes the variability 


to be found in the variables of a matrix X (of independent vari- 


ables only) into an equal number of uncorrelated components. The 


24/ E.g., Johnston, Section 11-1. 


222 


great advantage of this method is that the first three or four 
components might possibly capture 99% of the variance; if so, the 
number of independent variables has probably been considerably 
reduced, while nearly the same amount of "information" (i.e., vari- 
ability) has been captured by the components, which are considerably 
fewer in number. Not only is this potentially useful for dealing 
with the problem of undersized samples, but it may also be useful 
for dealing with the problem of multicollinearity (Section 6.2), 
since essentially the same information can be captured by fewer 
uncorrelated components. The disadvantage of its use for this pur- 
pose, however, is that it is difficult to interpret the estimated 
coefficients resulting from a regression of the same dependent 
variable on the new components. 

In the case of undersized samples, the matrix of all x 
variables in the system is entered and the principal components 
obtained (see below). An appropriate subset of the components, 
commencing with component 1, can then be selected to serve as the 
right-hand variables for stage one of 2SLS or 3SLS. The remaining 
estimation proceeds as before. 

A warning should be issued concerning the use of principal 
components. Since the method imposes an arbitrary normalization, 
different results are obtained according to whether the information 
on the independent variables is entered in original units, in terms 


of deviations around the means, in standardized units (units 


223 


divided by standard deviations), or as a correlation matrix. 
Different programs accept data in different forms and the analyst 
should ascertain which method is being employed by the particular 


program being used. 


The PRINCIPAL Program 


The PRINCIPAL program, located in the public libraries 
in the workspace 34 PRINCIPAL, performs principal component analysis 
on an input matrix M of independent variables. The program is 


executed by typing 
PRINCIPAL M 


where M is a matrix of k columns, each representing an independent 
variable. The user is asked to indicate those columns of M on 
which an analysis is required. 

The output consists of three tables. Table 1 is a matrix 


of four columns containing 


1) Component number; 
2) Eigenvalue associated with that component; 


3) Variation explained by that component as a 
percentage of the total variation; and 


4) Cumulative percentage variation explained. 


224 


Table 2 is a [kxk] matrix of eigenvectors (weights) for each com- 
ponent. Table 3 isan n (number of observations) by k (number 
of principal components) matrix of principal components, each 
column representing a component. Quite frequently a number of 
components contain only zeros; in this case the first all-zero 
component is printed, along with a message that all remaining com- 
ponents contain only zeros. 

Finally, the program requests whether or not a regression 
analysis is to be performed using the components. If so, the user 
must indicate which components are desired as independent variables 
and then save the workspace. The desired components are stored in 
a matrix M. With a negative response, all principal components 
are stored in a matrix M, and the workspace must be saved if these 
are required for future use. 

Example 12 illustrates the output of this program. The program 
accepts raw data (that is, a matrix X ) as input and internally 
transforms it into deviations around the means; this forms the basis 
for the calculations. Consequently, the weights presented in the 
table of eigenvectors are the weights to be attached to these devia- 


tions in the construction of each component. 


PRINCIPAL MATRIX 


us 
18 


COUECWENT 
1 


D AO E G) SÑ 


KGR ALUS 
16114,9016514 
10097.5735379 

4288.3748480 
3033,.3488795 
420.9677500 
0. 0000900 
0.0000000 
0.0000000 


` 225 


Example 12 


ASSOCIATED Zi? 


AS 


PERCENTAGE 


D d S CER 


EIGENVECTORS (WEIGHTS) FOR COMEONENT NUMBER 


1 
RE 
0.005 
70.398 
70.226 
0.525 
“0,397 
0.372 
0.225 


OO A O: C? + WN FE 


PRINCIPAL 
1 
“34,555 
20,900 
758.848 
751.754 
kl. 486 
79.772 


OO o E G N kä 


DO YOU WANT TO RUN A REGRESSION IMMEDIATELY? 


NO 


2 
0.609 
“0.382 
0.058 
0.973 
0,024 
"6, 502 
“0,103 
0.352 


COMPONENT NUMBER 


2 
7.993 
51,791 
26.758 
“31.162 
71.549 
“23.347 


3 
“0,398 
0.462 
0.157 
0.003 


0.506 


70.634 


0.1322 
“0.057 


3 
“49,990 
“43,147 

32.918 
16.394 
72.987 
17.74434 


4 
70,286 
"0 031 
0,236 
70.085 
0,193 
0.262 
70.084 
0.861 


4 
T14.162 
1.301 
728,034 
35.166 
22.765 
47.037 


47.4593508 
29.7379590 
12.6295208 
8. 9333942 
1.2397752 
0.0000000 
0.0000000 
0.0600000 


9.527 


YOUR PRINCIPAL COMPONENTS ARE STORED IN A MATRIX M. 


IF YOU WISH TO USE THEM AGAIN, 


)WSID COMP 
WAS 34 PRINCIPAL 
)SAVE 


OWENTS 


14.22.49 05/31/77 COMPONENTS 


SAVE THIS WORKSPACE, 


0 044 
0.190 
“0.647 
0.151 
0.061 
0.048 
70.706 
0.133 


6 
0.000 
0.000 
0.000 
0.000 
0.000 


INDICATE COLUMNS ON WHICH YOU WISH PRINCIFAL COMPONENT ANALYSIS PERFORMED 


CUMULATIVE 
PERCENTAGE 
47.4593508 
77.1973098 
89.8268306 
98.7602248 
100.0000000 
100.0000000 
100.0000000 
100.0000000 


0.470 
0.510 
70.055 
70.701 
0.005 
0.102 
0.071 
0.097 


0.000: 
THE REMAINING 2 PRINCIPAL COMPONENTS CONSIST ENTIRELY OF ZEROES. 


0.059 
0.459 
0.494 
0.322 
0.627 

70.049 

70.195 
0.069 


226 


CHAPTER X 


SUMMARY 


It is hoped that the contents of this manual will encourage 
the efficient and correct use of the tools of econometrics. 
Throughout the presentation of the material, the orientation has 
been towards the application of the theory and results, both with 
respect to interpretation and use with a computer. Although the 
SHARP APL System has been employed to illustrate the computer appli- 
cations, it is hoped that the manual will also prove useful to those 
who have access to other systems. 

The material has concentrated on the estimation and use of 
a single econometric equation. As was pointed out in the Introduc- 
tion (Chapter I), the practice of econometrics involves four main 
steps: (i) specification, (ii) data collection, (iii) estimation, 
and (iv) hypothesis testing. Little has been said about the first 
step, except to outline the implications of both linear and curvi- 
linear specifications (Sections 4.8 through 4.10), and to note 
many times that it is crucial to start with a correctly specified 
equation -- the consequences of an incorrect specification are 
serious (Section 6.5). Ideally, the specification should be sugges- 


ted by the appropriate theory, but the knowledge of previous research 


227 


results and institutional arrangements can also be very useful. 

The second step also received little attention, since it 
is a task which is specific to each project. However, it is 
always important to check that the data being used is the relevant 
data, and that it is measuring what is in fact desired. Chapter 
II was devoted to the preparation of data for use in the basic 
estimation process, while Chapter IV was concerned with the myriad 
data manipulations which are possible and may be necessary for sub- 
sequent work. 

The bulk of the material in this manual has been concerned 
with step (iii) - estimation. Chapter III was devoted to straight- 
forward ordinary least squares (OLS) estimation, and Chapter VI 
extended this discussion to cover many of the problems that are 
encountered in practice. The presentation included the detection 
and elimination (where possible) of the difficulties. More compli- 
cated estimation problems were dealt with in Chapter VII where 
applications of generalized least squares were reviewed, in 
Chapter VIII, where the problem of estimating lagged relationships 
was discussed, and in Chapter IX, where the implications of 
estimating an equation which is part of a system of equations were 
considered. 

An entire chapter (Chapter V) and numerous paragraphs 
elsewhere in the text were devoted to step (iv). Hypothesis 
testing is concerned with the validation of the estimated equation, 


and as such provides a guide to its appropriateness for use in 


228 


subsequent work. One such use was discussed in Chapter VIII where 
the problem of prediction (or forecasting) was discussed. 

The contents of this document should cover almost all the 
problems and questions that are likely to arise in the estimation 
of a single linear econometric equation;1/ this type of estimation 
comprises the vast majority of the analysis currently being per- 
formed using these techniques. Every attempt has been made to 
present the material as clearly and unambiguously as possible, 
with extensive use being made of footnotes to reference the rele- 
vant proofs and elaborations contained in a text. One main text 
has been referenced throughout and the authors have endeavoured 
to maintain a consistent notation, both internally and with the 
text, in order to minimize the amount of duplication and, hopefully, 
encourage maximum use of the contents of both. Unfortunately, the 
analyst must still conquer an often bewildering array of notation 
if the material is to be efficiently and correctly applied. 

Econometrics has been termed the "art of the error term"; 
since the "true" error term (or stochastic disturbance) is never 
known, it is truly an art! Although the techniques appear to be 
clear-cut (even if they seem complicated) and the aura of the com- 
puter imparts an exactness that is at least as accurate as in any 


other discipline, it is always important to remember that 


1/ Non-linear estimation is briefly discussed in the Appendix. 


229 


subjectivity comes into play at the very first step of the analysis - 
specification of the relationship - and is employed in many ways 
throughout the analysis. What is satisfactory to one analyst may 

be unsatisfactory to another. The best judgement (apart from that 
of your boss!) as to whether a piece of econometric analysis should 
be disseminated can probably be obtained by confronting it with the 
question "Given the information presently available, does it make 
sense?" An affirmative response probably indicates that the tech- 
niques of econometrics presented in this document have not been 


misused. 


230 


APPENDIX 
NON-LINEAR ESTIMATION 


Whenever an equation is non-linear in the parameters, as distinct 
from non-linear in the varíables Ai the problem of estimation by least 
squares becomes quite difficult. Generally, results such as result (9), 
namely 


var (8) = s2(x'x)7! (9) 


are not available and approximations based on linear least squares are 
used. Another problem in the non-linear case is that the least squares 
estimators are no longer unique, since the mathematical representation 
of the residual sum of squares at each iteration may have multiple 
minima, i.e., a number of different estimated values of the parameters 
of the model may result in the same residual sum of squares. Usually 
the matrix 8 of parameter estimates cannot be computed directly 


using 
B= (xx) xty , (8) 


but is instead approached through an iterative technique such as the 


1/ Such as is the case with polynomial regression, reciprocal 
regression, and log linear regression (Sections 4.6, 4.8 and 
4.9). An example of an equation that is non-linear in 
parameters is equation (27) on page 126. This is considered 
again below. 


231 


Gauss method, SI the method of steepest descent, or a method which 


combines the Gauss method and the method of steepest descent. 2/ 
This latter method is used in the following program. 
The MARQUARDTP Program 
Consider the following problem. 4/ The energy, E , radiated 


from a carbon filament lamp per squared centimetres per second was 
measured at six filament temperatures T . The observed data are 
given in the following table, where T is the absolute temperature 


of the filament in thousands of degrees Kelvin: 


1.309 1.471 1.490 1.565 1.611 1.680 


2.138 3.421 3.597 4.340 4.882 5.660 





The equation 
(67) 


is hypothesised by the experimenter. A plot of the data on log-log 


2/ See G.W. Booth, G.E.P. Box, M.E. Muller, and T.I. Peterson, 
Forecasting by Generalised Regression Methods, Non-linear 
Estimation (New York, 1.B.M., 1959). 

3/ D.L. Marquardt, "An Algorithm for Least-Squares Estimation of 
Non-Linear Parameters," Journal of the Society of Industrial 
and Applied Mathematics, Vol. 2, (1963), pp. (GEAR 

4/ Taken from E.S. Keeping, Introduction to Statistical Inference 
(Princeton, D. Van Nostrand Co. Inc., 1962). 


232 


paper indicates that a and b are approximately 0.725 and 4.0, 
respectively. Hence these values can be used as initial parameter 
estimates in the accompanying example. 

The program used to estimate non-linear equations such as (67) 
is called MARQUARDTP. It is located in the public libraries in the 
workspace 32 NONLINEAR and produces least squares estimates of the 
parameters entering non-linearly into an econometric equation. An 
iterative technique is employed; the estimates at each iteration are 
obtained by a method due to Marquardt! which combines the Gauss 
(Taylor series) method with the method of steepest descent. Since 
any type of mathematical model may be estimated (including linear 
and intrinsically linear models, as is mentioned below), a subprogram 
which defines the equation must be provided by the user. 


The program is executed by a statement of the form 


Y MARQUARDTP X 


where Y contains the dependent variable and X the independent 
variable(s). If there are k independent variables in the hypothe- 
sised (non-linear) equation, X should be a matrix whose k columns 
contain the k variables. X may be either a vector or a 1-column 
matrix if the equation specification includes only 1 independent 


variable. In execution the program refers to a subprogram called 


5/ See footnote 3. 


233 


AFN which must be (previously) provided by the user. This contains 
the specification of the hypothesised equation and is used to compute 
the predicted values of the dependent variable (i.e., the y's) which 
are used elsewhere in the program. For example, to define the equation 
given in (67) above, the following subprogram is required: 

VR<A AFN X 

[1] R<A[41]xX*A[ 219 

where R is (arbitrarily) chosen to represent the result of the 
equation. In this case, A{i] and A[2] are the parameters to be 
estimated, A[1] representing a and A[21 representing b in (67). 
Any additional parameters in a more complicated equation would be 
referred to as A[3], Alu], ..., ALK]. X represents the (single) 
independent variable T in (67). Note that for equations containing 
k independent variables, variable i<k should be referred to as 
column i of the matrix X within the subfunction AFN. For example, 
if the hypothesised equation contains 3 independent variables, these 
should be referred to as X[;1], X[;2] and X[;3], respectively 
(see Example 15 below). 

In addition to the definition of the equation subprogram AFW , 
the specification of certain subsidiary arguments must be completed 
prior to any execution of MARQUARDTP. These arguments are defined by 
a group of state setting functions similar to those in the workspace 


32 REGRESSION (described in Section 3.1). The functions, with default 


234 


values (where applicable) in square brackets are summarised below. 
STATE 

displays the current setting of the state in its entirety. 
DEFAULT 

resets the entire state to its default values. 
INITIAL ESTIMATES X 


defines the initial estimates for each of the parameters in the model 
in the sequence defined above in the subproqram AFN . Note that it 
is very important that these initial estimates for the parameter 
values be relatively accurate. Quite apart from the possible saving 
in cost of computation, the program will converge to more reasonable 
final values for the parameters given good initial estimates (recall 
that, generally, non-linear least squares estimates tend to be 
non-unique). 

LOWER LIMITS X [LOWER LIMITS ~1E75] 

UPPER LIMITS X [UPPER LIMITS 1E75] 
sets lower and upper limits on the estimated parameter values. If, 
during any iteration, any parameter estimate exceeds its upper limit 
or becomes less than its lower limit, execution terminates. There 
may be as many lower and upper limits specified as there are 


parameters in the model. 


235 


MINIMUM RSQUARED X (MINIMUM RSQUARED 1] 


causes cessation of iteration if the calculated value of R-squared 
exceeds X. Note that this is merely one of four termination 


criteria (these criteria are discussed below). 
PROPORTIONS X [PROPORTIONS 0.01] 


specifies the proportions which are to be used in the calculation 

of the difference quotients which approximate the partial derivatives. 
These values are multiplied by the estimated parameters at each 
iteration to produce the denominators of the difference equations, 
thereby determining the "step size" for the method of steepest 
descent. A value of 0.01 for each parameter seems to work reasonably 


well. 
MAXIMUM ITERATIONS X [MAXIMUM ITERATIONS 20] 


sets the maximum number of iterations after which execution is to be 


halted, without regard to proximity to convergence. 
PRINT EVERY X (PRINT EVERY 5] 

defines the iterations for which results are to be printed. For example, 
PRINT EVERY 5 


requires that every fifth iteration is to be printed, whereas the 


default setting causes printing at every iteration. The results of 


236 


the final iteration are always printed, even if convergence has been 
attained during an iteration where a printout has not been requested. 
Iteration may cease if any one of four criteria has been 


satisfied: 


1) if the relative change in each parameter becomes 


less than .00001. 


2) if the relative change in the sum of squared 


residuals becomes less than .00001. 


3) if the calculated value of R-squared exceeds the 


specified minimum. 


4) if there has been no convergence after the specified 
maximum number of iterations. In this case, the user 
may wish to start again, using the final estimated 


parameter values as initial parameter estimates. 


The final parameter values are available upon completion of the execution 
of the function MARQUARDTP in a global variable called B. In addition, 
the variable M is available; this is an [nx2] matrix containing the 
observed and predicted y-values, respectively. These may be useful for 


subsequent plotting purposes. 


237 


Note that execution will also terminate should one or more of 
the parameter estimates stray beyond their specified limits. In this 
case, the final parameter estimates are still available in the variable 
B , but no other result statistics are printed. 

The full output produced by the MARQUARDTP program is illustrated 
in Example 13. Note that one item of information provided during itera- 
tion is the determinant of the matrix which must be inverted in order 
to calculate the correction vector. This determinant should decrease 
during successive iterations. A sudden large drop in the value of the 
determinant is due to ill-conditioning, and suggests that the results 
may be inaccurate. Singularity difficulties can also be foreseen by an 
examination of the magnitude of this determinant. A further indication 
as to the conditioning of the moment matrix (X'X) is provided by the 
ratio of the largest to the smallest of its eigenvalues - the larger 
the ratio, the worse the conditioning. Negative eigenvalues may occur 
if the conditioning is extremely bad, possibly resulting from the 
inclusion of more parameters in the specification than are really 
necessary to explain the data. Removal of parameters, if appropriate, 
can remedy this problem. 

The equation hypothesised in (67) above is a non-linear model 
which is intrinsically linear; i.e., a simple transformation reduces 
the equation to a linear specification which could then be estimated 
using the ordinary least squares technique employed in the REGR 


program. By taking the natural logarithms of each side of (67) to 


238 


Example 13 


V ReA AEN X 
ii] R+AC1]*X*AL 2] 
ç 


INITIALS ESTINMATES +129 4 
MAXIMUM ITERATICNS 19 


STATE 
INITIAL ESTIMATES 0.725 A 
LOWER LIMITS “LETS 
UPPER LIMITS 1275 
MININUM RSQUARED 
PROPORTIONS 01 


1 
c. 
MAXIMUM ITERATIONS 19 
PRINT EVERY i 


E MARQGUARDTP T 


PRELIMINARY ANALYSIS 
INITIAL SUM OF SQUARES 4. b 2 18052 2 


EIGENVALUES OF MOMENT MATRIX 
2.2196981E2 
3.71626408 4 


ITERATION NUMBER i 
DETERMINANT 2.0207138EF 2 
ANGLE IN SCALED COORDINATES 81.39 DEGREES, 
NEW PARAMETER ESTIMATES 


0.763282 

3.874577 
NEW SUM OF SQUARES 4470203908 3 
LAMBDA 1.00000E 3 


ITERATION NUMBER 2 
DETERMINANT 1, 8790762928 2 
ANGLE IN SCALED COORDINATES 82.94 DEGREES. 
NEW PARAMETER ESTIMATES 


0.768848 

3.860516 
NEW SUM OF SQUARES 431741628 3 
LAMBDA 1.900008 4 


Continued. 


239 


Example 13 continued. 


ITERATION NUMBER 3 
DETERMINANT 1. 86544S54E 2 
ANGLE IN SCALED CCORDINATES 79.34 DEGREES, 
NEV PAPANMETER ESTIMATES 

0.768915 

3.660257 
NEW SUM OP SQUARES %.3173175E 3 
LAMBDA 1,0000CR S 


ITERATION NUMBER 4 


DETERMINANT TECH ERT 
ANGLE IN SCALED COORDINATES 0.57 DEGREES, 
NEW PARAMETER ESTIMATES 

0.768915 

3.869257 
NEW SUM OF SQUARES bk, 84733 ThE 3 
LAMBDA 1. 00000E0 


ITERATION STOLS. RELATIVE CHANGE IN EACH PARAMETER SESS THAN 0.00001, 


APPROXIMATE STATISTICS FROM LINEAR THECRY 


EST. PAR. STD. ERR, T-VALUE 
Os 768915 0.918153 42,357844 
3.869257 0.050894 75.849003 


CORRELATION MATRIX OF THE ESTINATED PARAMETERS 
1.000000 9.990649 
70.990640 1.000000 


VARIANCE. OF RESIDUALS ak EK 0.001079 
R SOUARED 1.5645 64643 ktk RAED 0.999433 
R-BAR SQUARED: xcihaepeis eee wee e 0.999291 
DURBIN-WATSON STATISTIC nu 1.948463 
CBSERVED Y ESTIMATED Y RESIDUALS 
2.138000 2.174179 70.036179 
3.421006 3.412191 0.909809 
3.597000 3.584442 6.012558 
4.340000 4.332648 0.997352 
4.882000 4.845294 0.036796 


5.660000 5.696785 “0.036785 


240 


obtain 


(Im E) = (Ina) + b(in T) , (68) 


a specification of the form 


a + BX 


< 
" 


where y =InE, a=tIna, B sb and X = 1nT is obtained. This is 


1 


identical with equation (6) and thus an ordinary least square analysis 
using (In E) as the dependent variable and (Tn T) as the independent 
variable will provide the estimates of a and B. The antilog of the 
estimated intercept coefficient (a), together with the estimated slope 
coefficient ĝ are then the required parameter estimates of a and b 
respectively in (67). 

However, note that all equation statistics (RÊ, etc.) resulting 
from the OLS estimation of (67) will be measured in terms of the new 
dependent variable (In E). These statistics should be recalculated 
to reflect variance in the dependent variable E (see Section 4.10). 

The estimated parameters produced by the MARQUARDTP program in 
estimating (67), and those produced by an OLS analysis using REGR in 
(68) are almost identical; they are not exactly the same because of 
the iterative technique and arbitrary convergence criteria imposed by 
the MARQUARDTP program. These results are presented as Examples 13 
(above) and 14, respectively for the problem outlined at the beginning 


of this section. 


900S72SL72T9 °0 
ST86L68c49L SE 
8966L77806 0 
T8¿8¿SS€q6°9198 
TLT00Cch666 0 
L8£T098€S666 30 


CEET TAS 
GLL8E78 x 


L000 `0_ 
£000 "0 


CH CH 
CO CH 


D 
D 


Ori 


8 
L 


SENATITAAIOD GHDVWIISH J0 XTYLVH ZINVIXPAOI-GONVIYVA 


seessss (4 do NYAN HHL LV) NOTLVIYVA JO DNATITAAGOD 
mas car s I a a Ee EE E ERT E EECH NOSZPA-NIGUNG 
seeseecrsssssessss se sabVaItSa AHL dO YOUN GUVGNVIS 
(h ‘T )NOISSHYOITY JO TINVIIAINSIS 403 IIASILVIS-d 


RAINE ASAIO s SEN ONSEN ISSA SSUES SSIES E ) ¿xU 43193444032 


s* 3 4966666 (xH) JN3STI291433029 NOILVITAHYYOIJ 374I4q7nH 


oTCOn TT 
L00000 L00090 
9S€46“*9T98 LEE8BS °G LEE8S "o 
9S8T18 °0LI 
OTdSTIVLS-4 JdVn02S wean SHa4¥ NOS JO UNS 
nice 0¿¿h0 Ü FG £ TO € 
6c060*9T_ 88¿L710 0 sS/¿L8¿c 0 _ 
g07VA- gO0N43 “ais JNÑ3SI29I143302 Q3176212S35 
BLEHE ° T 


PI eTduexg 


9 TPLOd 
h TYNAISAY 
t L X*¥Ossay oa 
T NV ahi 
Ad NOTLVIXYVA 492 AIXNOS 
S7OTHNG t 
HYUGL INVYISNOJ 
NV aH ATAVIYVA 
ATEAVITYVA DNAGNGAdHYd JO NVAW 
60000 1 6HLZ8°“Z6 
LL666°0 00000°T 


(SINIVA-J HIM) XIYIVW NOILFT381N0209 


YAd¥d NOITV 
Je HYIFH (de) 


GIHVIYVAOION SVM 
HONVTYV AOD 
FINVIYYAOION 
STPVNdGISAYON 
NOILVT3Hg3029 
VAONV 
DNV DSNOD 
abvds 


242 


In general, most non-linear equations will not be intrinsically 
linear, and MARQUARDTP will be the only available method of estimating 
the parameters. One such example is the transformed equation (27) for 


autocorrelation 
Ye = a(1 = p) + BX, = pb, s + Opty Q (27) 


Note that there are three parameters (a, B and p) to be estimated 

in this example. The subprogram AFN appropriate for this application 
is defined at the top of Example 15. It is assumed that X is an 

x 


[nx3] matrix containing x and y. as the columns in that 


t “teil 


order. Consequently 
A[1] represents a in (27) 
A[2] represents 8 in (27) 


and AC3] represents p in (27). 


Note that, since APL functions read from right to left, it is convenient 
to include the interactive coefficient term (-Box,_1) on the right in 
the definition of AFN . 

Example 15 illustrates the use of the MARQUARDTP program to esti- 
mate equations like (27) which contain more than one independent variable 
and are not intrinsically linear. It employs the same data as was 
used for the autocorrelation examples presented in Section 6.4 (i.e., 
Examples 4 to 6) and for the generalized least squares example presented 


in Section 7.2 (i.e., Example 7). In this case the program converges 


243 


Example 15 


VR+A AFN X 
[1] R<4(A[1]Jx1-A[3]J)+(A[2]JxX[;1]J)+(A[3]xX[;3])-X[;2]JxAL2JxAL3] 
[21 y 

INITIAL ESTIMATES 6494.89 .80251,1-.7015#2 

MAT<Q3 u70(1+PDI),( 1+PDI), 1+CON 


(1+CON) MARQUARDTP MAT 


PRELIMINARY ANALYSIS 
INITIAL SUM OF SQUARES 1.5290439F7 


EIGENVALUES OF MOMENT MATRIX 
2.5195178F 1 
2.1889835F10 
2.6487772E7 


ITERATION NUMBER 1 
DETERMINANT 4.5617689EF 2 
ANGLE IN SCALED COORDINATES 37.80 DEGREES. 
NEW PARAMETER ESTIMATES 
7231.295140 


0.790749 
0.650680 
NEW SUM OF SQUARES 1.5140208EF7 
LAMBDA 1.00000F 3 


ITERATION NUMBER 2 
DETERMINANT 4.1180892F 2 
ANGLE IN SCALED COORDINATES 70.83 DEGREES. 
NEW PARAMETER ESTIMATES 

7308.360493 


0.789528 
0.665427 
NEW SUM OF SQUARES 1.5133052E7 
LAMBDA 1.00000F 4 


Continued. 


244 


Example 15 continued. 


ITERATION NUMBER 3 
DETERMINANT 
ANGLE IN SCALED COORDINATES 
NEW PARAMETER ESTIMATES 

7351.730679 

0.788822 

0.667833 
NEW SUM OF SQUARES 
LAMBDA 


ITERATION NUMBER 4 
DETERMINANT 
ANGLE IN SCALED COORDINATES 
NEW PARAMETER ESTIMATES 

7361.034022 

0.788672 

0.668911 
NEW SUM OF SQUARES 
LAMBDA 


4. O643564F 2 
65.21 DEGREES. 


1.5132477F7 
1.00000F 5 


4.0380866F 2 
73.78 DEGREFS. 


1.5132428EF7 
1.00000F 6 


ITERATION STOPS. RELATIVE CHANGE IN SUM OF SQUARES LESS THAN 0.00001. 


APPROXIMATE STATISTICS FROM LINEAR THEORY 


EST. PAR. STD. ERR 
7361.034022 1545.599792 
0.788672 0.024855 
0.668911 0.149034 


T-VALUE 


ee mm 


4.762574 
31.730964 
4.488312 


CORRELATION MATRIX OF THE ESTIMATED PARAMETERS 
1.000000 ~0.985796 0.534515 
“0.985796 1.000000 ~0.523524 
0.534515 “0.523528 1.000000 


VARIANCE OF RESIDUALS ....... 
R SQUARED ............ Reen ee e 
R-BAR SQUARED eneen ene 
DURBIN-WATSON STATISTIC ..... 


343918.812346 
0.996682 
0.996531 
2.358959 


245 


in three iterations to almost exactly the same parameter estimates 
generated in the abovementioned examples (especially Examples 5 and 
6). All other equation statistics are also very similar. 

Finally, it can be noted that it is possible for the user to 
obtain additional accuracy by supplyinq analytic solutions to the 
partial derivatives (if available) rather than using the numerical 


approximations used in the library version of the MARQUARDTP program. 


246 


BIBLIOGRAPHY 
An annotated bibliography of text books in econometrics. 


Notes: This bibliography does not include a complete listing of 
texts in statistics and time-series analysis, both of 
which do include many topics of importance to the sub- 
ject matter of econometrics. 


Annotations are included for convenience only and the 
lack of an annotation should not be taken as a reflection 
of the quality of the book -- it only implies that the 
book was not readily available at the time that this bib- 
liography was compiled. 


Books marked with an asterisk (*) are the main references 
cited in this document. 


Aigner, D.J.; Basic Econometrics, (Prentice-Hall, N.J., 1971). 


Anderson, T.W.; An Introduction to Multivariate Statistical 
Analysis, (Wiley, N.Y., 1958). 


Anderson, T.W.; Time Series Analysis, (Wiley, N.Y., 1971). 


Beals, R.E.; Statistics for Economists: An Introduction, (Rand 
McNally, Chicago, 1972). A very clear, modern introduction 
to econometrics with particular emphasis on the underlying 
statistical concepts. Includes multiple regression, an 
introduction to the problems and to simultaneous equations. 
Recommended as an introductory text. 


Brennan, M.J.; Preface to Econometrics: An Introduction to 
Quantitative Methods in Economics, Second Edition, (South- 
Western, Cincinnati, 1965). so an introductory text. 


Bridge, oat Applied Econometrics, (North-Holland, Amsterdam, 
1971). 


Christ, C.F.; Econometric Models and Methods, (Wiley, N.Y., 1966). 


A very good, medium to advanced text covering all topics 
with an orientation towards applications to economic models. 


Chu, K.; Principles of Econometrics, Second Edition, (Intext Edu- 
cational, Scranton, 1972). 


247 


Cramer, J.S.; Empirical Econometrics, (North-Holland, Amsterdam, 
1969). Not a text. Concentrates on applications of econo- 
metrics in economics. Includes chapters on linear regression 
and simultaneous equations, together with chapters on random 
arrivals, consumer behaviour, size distributions, family 
budgets, consumption, demand and production. 


Desai, M.; Applied Econometrics, (Philip Allan, Oxford, 1976). 
Examines different applications, such as demand analysis, 
production functions, technical change, investment, wages 
and prices. 


Dhrymes, P.J.; Econometrics: Statistical Foundations and Applications, 
(Harper and Row, N.Y., 1970). An advanced treatment of multi- 
variate analysis, simultaneous systems and spectral analysis. 

One of the few econometrics texts which includes the latter 


topic. 


Dhrymes, PJ: Distributed Lags: Problems of Estimation and Form- 
ulation, (Holden Day, San Francisco, 1971). A modern, 
advanced treatment of the estimation of distributed lags 
by a considerable variety of methods. A specialised text. 


Dowling, J.M. and F. R. Glahe (eds.); Readings in Econometric Theory, 
(Colorado Associated University Press, Boulder, 1970). A very 
useful collection of 27 of the original articles on a variety 
of subjects. 


Draper, N.R. and H. Smith; Applied Regression Analysis, (Wiley, 
N.Y., 1966). 


Dutta, M.; Econometric Methods, (South-Western, Cincinnati, 1975). 


Ezekiel, M. and K.A. Fox; Methods of Correlation and Regression 
Analysis, Third Edition, (Wiley, N.Y., 1959). 
Fisher, F.M.; The Identification Problem in Econometrics, (McGraw- 


Hill, N.Y., 1966). A comprehensive advanced treatment of a 


particular topic. 

Fishman, G.S.; Spectral Methods in Econometrics, (Harvard University 
Press, Cambridge, 1969). A standard text in this field. 

Fox, K.A.; Econometric Analysis for Public Policy, (Iowa State 
University Press, Ames, 1958). A collection of articles 


on demand analysis and econometric models representing 
research done between 1951 and 1955. 


248 


Goldberger, A.S.; Econometric Theory, (Wiley, N.Y., 1964). A 
standard, medium level text used extensively in the late 
1960s and still a very good source of material. Covers 
all important topics. 


Goldberger, A.S.; Topics in Regression Analysis, (Macmillan, N.Y., 
1968). A Less medium level reference book covering a 
variety of topics. 


Goldfeld, S.M. and R.E. Quandt; Non-Linear Methods in Econometrics, 
(North-Holland, Amsterdam, 19 


Goldfeld, S.M. and R.E. Quandt; Studies in Non-Linear Estimation, 
(Ballinger, Cambridge, 1976). 


Graybill, F.A.; Theory and Applications of the Linear Model, 
(Duxbury, Mass., 1976}. 

Haitovsky, Y.; Regression Estimation from Grouped Observations, 
(Griffin, London, 1973). 

Hannan, E.J.; Time Series Analysis, (Methuen and Co., London, 
1960). A medium-level text outlining the details of time- 
series analysis. 

Hanushek, E.A. and J.E. Jackson; Statistical Methods for Social 
Scientists, (Academic, N.Y., 1977). A very good book at 
a beginning-to-intermediate level with a wider range of 
applications than the usual econometrics text. 

Hood, W.C. and T.C. Koopmans (eds.); Studies in Econometric Method, 


(Wiley, N.Y., 1953). Contains many articles of a pioneering 
nature. Still an important reference for many topics. 


Hooper, J.W. and M. Nerlove (eds.); Selected Readings in Econometrics 
from Econometrica, (M.I.T. Press, Cambridge, 1970). 

Hu, TN: Econometrics: An Introductory Analysis, (University Park, 
Baltimore, 1973). Covers all topics at an introductory level 


without the use of matrix algebra. Looks like a good book 
for those with a limited background in the subject. 


Huang, D.S.; Regression and Econometric Methods, (Wiley, N.Y., 1970). 
A good, medium-level text covering all topics. 


Intrilligator, M.D., (ed.); Frontiers of Quantitative Economics, 
(North-Holland, Amsterdam, 1971). 


249 


Intrilligator, M.D. and D.A. Kendrick (eds.); Frontiers of Quanti- 


tative Economics, Vol. II (North-Holland, Amsterdam, 1974). 


*Johnston, J.J.; Econometric Methods, Second Edition (McGraw-Hill, 
N.Y., 1972). An excellent, medium-level text covering all 
important topics. Highly recommended for those with the 
necessary introductory background material. 


Kane, E.J.; Economic Statistics and Econometrics, (Harper and Row, 
N.Y., 1968). An excellent, introductory treatment of the 
necessary statistical concepts and an introduction to 
regression analysis. The book concludes with a chapter on 
the problems of single equation estimation. Recommended 
for the uninitiated! 


Kelejian, H.H. and W.E. Oates; Introduction to Econometrics: 
Principles and GE (Harper and Row, N.Y., 1974).. 
very useful book to prepare the beginner for an inter- 
mediate course in econometrics. Covers all important 
topics. Recommended. 


Klein, L.R.; An Introduction to Econometrics, (Prentice-Hall, N.J., 
1962). This book examines econometrics by looking at appli- 
cations to the standard economic models of demand, production 
and cost, income distribution and growth cycles. 


Klein, L.R.; A Textbook of Econometrics, Second Edition, (Prentice- 
Hail, LNCS NEUE s Š revised and rewritten version of 
Klein's 1952 text. A medium-level book which includes some 
statistical background and a discussion and an application 
of simultaneous equations and computation methodology. It 
does not cover the single equation problems. 


*Kmenta, J.; Elements of Econometrics, (Macmillan, N.Y., 1971). An 
excellent text covering all topics with worked examples at 
an introductory to medium level. Matrix algebra is not used 
until Chapter 10, which reflects the gradually increasing 
sophistication through the book. Highly recommended. 


Koutsoyiannis, A.; Theory of Econometrics: An Introductory Exposi- 
tion of eet Methods, (Macmillan, London, 1973). 
Maddala, G.S.; Econometrics, (McGraw-Hill, N.Y., 1977). A recent 


text which covers all important topics including forecasting 
and Bayesian methods in econometrics. 


Madansky, Sg Foundations of Econometrics, (American Elsevier, N.Y., 
1976). 


250 


Malinvaud, E.; Statistical Methods in Econometrics, translated by 
A. Silvey, (Rand McNaTTy, Chicago, 1970). A complete, 
w aaa text covering all topics. An excellent reference 
ook. 


Murphy, J.L.; Introductory Econometrics, (Irwin, Homewood, 1973). 
A medium-level text which is introductory in nature. 
Covers all important topics. 


Nelson, C.R.; Applied Time Series Analysis for Managerial Fore- 
casting, (HoTden-Day, San Francisco. 1973). A good place 


to start for an introduction to time-series analysis. 


Neter, J. and W. Wasserman; Applied Linear Statistical Models, 
(Irwin, Homewood, i9747 


Pindyck, R.S. and D.L. Rubinfeld; Econometric Models and Economic 
Forecasts (McGraw-Hill, N.Y., 1976). A very good, medium- 
level book concentrating more on multiple equation models 
than most texts. Covers most important topics plus chapters 
on forecasting, simulation and stochastic time-series. 
Contains numerous interesting applied examples. 


Plackett, R.L.; Principles of Regression Analysis, (Oxford University 
Press, Oxford, 1960). Also Clarendon, Oxford, 1958.) 


Poirier, D.J.; The Econometrics of Structural Change with Special 
Emphasis on Spline Functions, (North-Holland: Kees 
EHS 


Rao, P. and R.L. Miller; Applied Econometrics, (Wadsworth, Belmont, 
Calif., 1971). A useful, relatively introductory-level text 
with examples. Covers most important topics. 

Scheffe, H.; The Analysis of Variance, (Wiley, N.Y., 1959). A 
medium-Tevel, complete treatment of an important topic. 

Schmidt, P.; Econometrics, (Marcel Decher, N.Y., 1976). 

Surrey, M.J.C.; An Introduction to Econometrics, (Clarendon Press, 
Oxford, 1974). A useful, short paperback book with an 
appropriate title. Does not use matrix algebra. 

*Theil, H.; Principles of Econometrics, (Wiley, N.Y., 1971). 


Possibly the best advanced text available. Covers all 
topics at all levels of complexity. 


251 


Tinbergen, J.; Econometrics, translated by H. Rijken van Olst, 
(Allen and Unwin, London, 1951). Translation of a Dutch 
book, by this Nobel prize winning author, published several 
years earlier. A classic! 


Tintner, G.; Econometrics, (Wiley, N.Y., 1952). In spite of its 
early vintage this is still a useful reference, especially 
for examples in multivariate analysis and time-series analysis 
of cyclical fluctuations. 


Valavanis, S.; Econometrics: An Introduction to Maximum-Likel ihood 
Methods, (McGraw-Hill, N.Y., 1959). 


Wallis, K.F.; Introductory Econometrics, (Gray-Mills, London, 1972), 
Wallis, K.F.; Topics in Applied Econometrics, (Gray-Mills, London 


1973). Concerned with the inter-relation between economic 
theory and econometrics, this book contains applications to 
the consumption, production and investment functions and to 
simultaneous systems. 


Walters, AA: An Introduction to Econometrics, (Macmillan, London, 
1968). A good, introductory text with examples. Does not 
cover much of the single equation problems, but does present 
applications to consumption, production and linear program- 
ming. 


Williams, E.J.; Regression Analysis, (Wiley, N.Y., 1959). 


Wynn, R.F. and K. Holden; An Introduction to Applied Econometric 
Analysis, (Macmillan, London, 1974). 


Wonnacott, R.J. and T.H. Wonnacott, Econometrics, (Wiley, N.Y., 
1970). A very useful text divided into two parts: I, 
presenting material at an introductory level and II, which 
parallels Part I, covering and extending the same material 
at a more advanced level. Covers all important topics using 
the trigonometric approach. Complemented by the text entitled 


Introductory Statistics (1969) by the same two authors. 
Zaremka, P. (ed.); Frontiers in Econometrics, (Academic, N.Y., 1974). 
Zellner, A.; An Introduction to Bayesian Inference in Econometrics, 

(Wiley, N.Y., 1971). An excellent advanced treatment of a 


neglected topic in econometrics. Both Bayesian and non- 
Bayesian results are presented and evaluated. 


252 


Zellner, A. (ed.); Readings in Economic Statistics and Econometrics, 
(Little, Brown, Boston, 1968). A good selection of the 
applications of econometrics to a variety of economic problems 
including economic measurement, consumer behaviour, firm 


behaviour and macro models. 


INDEX 


APL, 8, 11, 43, 48 
example (see example ) 
program (see program) 
Adaptive expectations, 175 
Almon lags, 176, 191 
Analysis of covariance 
(See covariance analysis) 
Analysis of variance, 31, 38, 94, 
98 
Asymptote, 59, 65 
Autocorrelation, 43, 119, 141, 186 
first-order, 119, 120, 142 
non-autocorrelation, 27 
tests for, 120 


Bayesian econometrics, 147 
Best linear unbiased estimators 
(BLUE) 
generalized least squares, 118, 
120,140 
ordinary least squares, 27, 102, 
112 
Bias, 67, 102, 135, 138, 175, 203, 
208 


CANSIM Mini-Base 10, 130 

Catenate, 21 

Chow, G.C., 96 

Cobb-Douglas specification, 61, 79, 
110 


Cochrane-Orcutt procedure, 127 
Coefficient of variation, 28, 46 
Constant (see intercept) 
Constraint (see restriction) 
Conditional prediction (see predic- 
tion) 

Confidence interval, 121, 158, 164 
Consistent gstinetgr, 20 203 
Corrected R2 (or R 
Correlation 

coefficient, 34, 35, 86 

matrix, 34, 40, 106 

multiple correlation, 30 

Spearman rank coefficient, 116 
Covariance analysis, 95, 99 


Covariances, 41, 85 
Cross-section, 23, 154 


Degrees of freedom, 28, 29, 32, 
35, 84, 87, 99, 97, 100, T14; 
162 
Dependent variable (see variable) 
Deviation, 77 
Dichotomous variable (see vari- 
able) 
Dimension, 2, 14 
Differencing 
first, 78, 134, 191 
generalized, 124, 127 
Distributed lag (see lag distri- 
bution) 
Distribution (see also lag dis- 
tribution) 
chi, 149 
F, See 39 
normal, 29, 84, 121, 149, 194 
t, 29, 160 
uniform, 148, 188 
Dummy variable (see variable) 
Durbin, dn 122, 
Durbin-Watson 
statistic, 33, 50, 125, 137, 
175, 186 
test, 129 


Eigenvalue, 223, 237 
Endogenous variable (see variable) 
Error 
equation, 4, 40, 66 
specification, 109 
sum of squares, 26 
type I, 83, 121 


type II, 83 
variable, 27, 138 
Estimator 


generalized least squares, 140 
instrumental] variable, 205 
non-linear, 230 

ordinary least squares, 26 
simultaneous, 208 


three-stage least squares, 214 
two-stage least squares, 212 
Example 
] REGR I, 45 
REGR II, 47 
REGRESS, 51 
REGR III, 131 
COCHRANEAORCUTT, 132 
HILDRETHALU, 133 
GLS, 143 
PREDICT, 163 
PDLAG, 187 
SHILLER, 202 
STAGE3, 218 
PRINCIPAL, 225 
13 MARQUARDTP I, 238 
14 REGR IV, 241 
15 MARQUARDTP II, 243 
Exogenous variable (see variable, 
independent) 
Extraneous estimate, 108, 147, 196 


VO OO +! O: On Ps WP 


10 
11 
12 


Farrar-Glauber, 107 
equations, 117 

First difference, 78 

Fisher, F.M., 220 

Forecast (see prediction) 


Generalized differencing (see dif- 
ferencing 
Generalized least squares (GLS), 
102; TIT, 120, 123; - 140 
estimators, 111, 140 
prediction, 164 
Geometric lag (see lag distribu- 
tion) 
Glejser regressions, 116 
Goldberger, A., 112 
Goldfeld-Quandt test, 113 
Goodness of fit (see R2) 
Grouping 
data, 144 
equations, 152 


Haitovsky, Y., 31 
Heteroscedasticity, 40, 43, 110, 
141, 145, 149 
correction for, 117 


254 


definition of, 110 
tests for, 119 
Hildreth-Lu procedure, 129 
Homogeneity, 95, 101 
Homoscedasticity, 27, 113, 146, 
194 


(See also heteroscedasticity) 


Identification, 209 
Independent variable (see vari- 
able) 
Indexing, 17 
Indirect least squares (ILS), 207 
Initial parameter estimates, 234 
Instrument (see also variable), 
204, 210 
Instrumental variable (see vari- 
able) 
structurally ordered, 220 
Interactive dummy variable, 75, 
90 
Iteration, 116, 127, 129, 232 
iterative 3SLS, 215 
intercept, 4, 36, 69, 72, 77, 90 
Interval estimate (see confidence 
interval) 
Invert 
matrix inversion, 42, 104 
inverted distribution, 188 


Johnston, J.J., throughout text 


Keeping, E.S., 231 

Klein, L.R., 215 

Kmenta, J., 125, 146, 148, 155, 
171 

Koyck transformation, 170, 176 


Lag distribution, 166 
Almon, 176 
geometric, 169, 184, 188 
inverted V, 188 
Pascal, 189 
polynomial, 176 
Shiller, 190 
Least squares 
generalized, 102, 140, 153, 
164, 195 


indirect, 207 
non-linear, 230 
ordinary, 26, 157 
three-stage, 214 
two-stage, 211 

Linearity 
prior information and, 147 
restrictions, 150 
tests for, 151 

Logarithm, 60, 63 


Marquardt, D.W., 231 
Massager, 8 
Matrix, 2 
addition of, 53 
data, 3, 8, 103-12, 48 
determinant of, 237 
diagonal, 28, 117 
inverse of, 104 
Kronecker product of, 154 
multiplication of, 55, 57, 75 
partitioned, 9 
singular, 42, 197, 237 
symmetric, 111 
transposition of, 13 
Mean, 66 
Measurement error, 27 
Misspecification, 135 
Mixed estimation, 147 
Model, 1, 9, 207 
Multicollinearity, 27, 40, 103, 
168, 190, 222 
definition, 103 
perfect, 42, 104 
tests for, 106 


Non-linear 
estimation, 116, 135, 230 
program, 231 
specification, 55 


Observation, 2, 6, 8 

Ordinary least squares (OLS), 26 
estimators, 27 

Origin, 36, 77 


Parameter, 4, 7 
tightness, 195 
Partial adjustment, 175 


255 


Point estimate, 158, 165 
Pooling, 76, 95, 155, 161 
Prediction 

with GLS, 164 

with OLS, 157 
Principal component, 221 
Prior information, 77, 147, 191 

(See also extraneous estimate) 
Program 

COCHRANEAORCUTT, 217 

CONVERT, 68 

GLS, 141 

HILDRETHALU, 129 

MARQUARDIP, 231 

PDLAG, 184 

PREDICT, 162 

PRINCIPAL, 223 

REGR, 35 

REGRESS, 48 

SHILLER, 199 

STAGE2, 212 

STAGE3, 215 


R-squared (R2), 30, 77, 93, 146 
Random variable (see variable) 
Rank condition, 26, 104 
Rectangular hyperbola, 63 
Recursive, 208, 210 
Reduced form, 208 
Regression, 6 
curvilinear, 58, 61, 64 
log linear, 60, 230 
polynomial, 54, 89, 230 
reciprocal, 58, 230 
Regressor (see variable, indepen- 
dent) 
Residual (see error) 
Restriction 
in polynomial distributed lags, 
180 


in Shiller lags, 198 
linear, 150 

prior, 77, 80, 84 
stochastic, 193 


Sample, 157, 161, 165, 220 
Sargen, J.D., 213 
Scalar, 3 


Scaling, 53, 58 
Seasonal 
adjustment, 73, 80, 90 
dummy variables, 73, 90, 103 
aa k unrelated regressions, 
53 


Serial correlation, 119 
Shiller, R.J., 190 
lags, 190 
Significance 
level, 83, 87 
tests, 35, 39 
Simultaneous equations, 207 
Specification error, 135 
Standard deviation (see standard 
error) 
Standard error 
of a coefficient, 28, 105 
of a prediction, 159, 166 
of the estimate, 28 
Standardized variable (see vari- 
able) 
Structural form, 207 
Structurally ordered instrumental 
variables (SOIV), 220 
Sum of squares, 39 
error, 27 
explained, 30, 98 
residual, 26, 27, 29, 39, 98 
total, 29, 39, 98 
System of equations, 207 


Test 
Chi-squared, 149 
Chow, 96 
Durbin-Watson (d), 120 
F, 91, 39, 935 107, TIA; 151; 21614 
h, 122 
nested, 89 
normal, 121 
One-tail, 29, 86, 90 
t; 29, 35, 84, 88, 105; 107, 116, 
161 
Two-tail, 29, 86, 161 
Theil, Hi, 67, 77; 118, 142, 149, 
Hl, 165s. 1065121552217 
Three-stage least squares, 214 
Time-series, 23, 154 


Time trend, 68, 134 
Transformation 
autocorrelation (Koyck), 123, 
126, 170; 176 
double 10g, 61, 137 
log-log, 61 
logarithmic, 61 
reciprocal, 58 
semi-log, 61 
Turning point, 57, 65, 70 
Two-stage least squares, 211 


Uniform distribution (see distri- 
bution) 

Units of measurement, 54, 65, 173, 
205, 222, 240 


Variable, 1 
dependent, 4, 7, 28, 41, 62, 65 
dichotomous, 72 
dummy, 72, 75, 80, 90, 103 
endogenous, 207 
exogenous, 207 
explanatory (see independent) 
independent, 4, 7, 31, 40, 103 
instrumental, 203 
lagged, 70, 166 
regressor, 36, 39 
standardized, 122, 222 
structurally ordered, 220 
Variance, 26, 77 
Variance-covariance matrix, 27, 
28, 41, 55, 85; 112, 199, 179; 
205 
Vector, 2, 7 
addition of, 53 
multiplication of, 75 
partition of, 8, 9 
of residuals, 5, 7, 40 
von-Neumann ratio, 121 


Zellner, A., 147, 217 





